◐ Shell
clean mode source ↗

Message 69576 - Python tracker

Just to clarify: Python can be built as UCS2 or UCS4 build (not UTF-16
vs. UTF-32).

The conversions done from the literal escaped representation to the
internal format are done using the unicode-escape and raw-unicode-escape
codecs.

PYC files are written using the marshal module, which uses UTF-8 as
encoding for Unicode objects.

All of these codecs know about surrogates, so there must be a bug
somewhere in the Python tokenizer or compiler.

I checked on Linux using a UCS2 and a UCS4 build of Python 2.5: the
problem only shows up with the UCS4 build.