Just to clarify: Python can be built as UCS2 or UCS4 build (not UTF-16
vs. UTF-32).
The conversions done from the literal escaped representation to the
internal format are done using the unicode-escape and raw-unicode-escape
codecs.
PYC files are written using the marshal module, which uses UTF-8 as
encoding for Unicode objects.
All of these codecs know about surrogates, so there must be a bug
somewhere in the Python tokenizer or compiler.
I checked on Linux using a UCS2 and a UCS4 build of Python 2.5: the
problem only shows up with the UCS4 build.