bpo-27413: add --no-ensure-ascii argument to json.tool by dhimmel · Pull Request #201 · python/cpython
test_ensure_ascii is failed with non-UTF-8 encoding.
$ LC_ALL=en_US.iso88591 ./python -m test.regrtest -v -m test_ensure_ascii test_json ... ====================================================================== FAIL: test_ensure_ascii (test.test_json.test_tool.TestTool) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/serhiy/py/cpython/Lib/test/test_json/test_tool.py", line 137, in test_ensure_ascii self.assertEqual(expect.splitlines(), json_stdout.splitlines()) AssertionError: Lists differ: [b'"\\u00a7 \\ud83d\\udc0d \\u03b4 \\ud834\\udc37"'] != [b'"\\u00c2\\u00a7 \\u00f0\\u009f\\u0090\\u008d \\[39 chars]b7"'] First differing element 0: b'"\\u00a7 \\ud83d\\udc0d \\u03b4 \\ud834\\udc37"' b'"\\u00c2\\u00a7 \\u00f0\\u009f\\u0090\\u008d \\[38 chars]0b7"' - [b'"\\u00a7 \\ud83d\\udc0d \\u03b4 \\ud834\\udc37"'] + [b'"\\u00c2\\u00a7 \\u00f0\\u009f\\u0090\\u008d \\u00ce\\u00b4 \\u00f0\\u009d' + b'\\u0080\\u00b7"'] ----------------------------------------------------------------------
Other tests are passed by accident. Input data is encoded with UTF-8 and decoded with locale encoding in json.tool. Output data is encoded with locale encoding in json.tool and you get back the initial UTF-8 representation.
Try to pass escaped data not encodable with locale encoding.
$ echo '["\u20ac"]' | LC_ALL=en_US.iso88591 ./python -m json.tool --no-ensure-ascii Traceback (most recent call last): File "/home/serhiy/py/cpython/Lib/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/serhiy/py/cpython/Lib/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/serhiy/py/cpython/Lib/json/tool.py", line 64, in <module> main() File "/home/serhiy/py/cpython/Lib/json/tool.py", line 58, in main sort_keys=options.sort_keys, File "/home/serhiy/py/cpython/Lib/json/__init__.py", line 180, in dump fp.write(chunk) UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 7: ordinal not in range(256)