Amaury, you are absolutely correct, \ud801 is not a valid unicode glyph,
however I am not giving Python \ud801, I am giving Python '๐' (==
'\U00010451').
I am attaching a different short example that demonstrates that Python
is mishandling UTF-8 on both the interactive terminal and in scripts, u.py
The output should be the same, but on Python 3.1.1 compiled for wide
unicode it reports two different values. As someone on #python-dev
found '๐'.encode('utf-16').decode('utf-16') outputs the correct value.