> Accepting all common forms for
> encoding names means that you can usually give Python an encoding name
> from, e.g. a HTML page, or any other file or system that specifies an
> encoding.
I don't buy this argument. Running attached script on http://www.iana.org/assignments/character-sets shows that there are hundreds of registered charsets that are not accepted by python:
$ ./python.exe iana.py| wc -l
413
Any serious HTML or XML processing software should be based on the IANA character-sets file rather than on the ad-hoc list of aliases that made it into encodings/aliases.py. |