> According to the Unicode standard the high and low surrogate halves used
> by UTF-16 (...)
Yes, but in Python, U+DC80..D+DCFF range is used to store undecodable bytes.
Eg. 'abc\xff'.decode('ascii', 'surrogateescape') gives 'abc\udcff'.
> Anyway, as you remark, my approach is a _patch_, designed to make python
> (2.x) work in an unicode environment, with the least amount of code
> change, for those willing to commit such a patch.
Python 2.7 is out and I think it is too late to fix Python2. Anyway, Python2
uses bytes for sys.path or other paths, so the problem only occurs if the user
specifies unicode paths.
> In 3.x you may want to do things differently.
I choosed to rewrite the C code to manipulate unicode paths instead of byte
paths => #9425