◐ Shell
clean mode source ↗

Message 129248 - Python tracker

Ezio and I discussed on IRC the implementation of alias lookup and neither of us was able to point out to the function that strips non-alphanumeric characters from encoding names.

It turns out that there are three "normalize" functions that are successively applied to the encoding name during evaluation of str.encode/str.decode.

1. normalize_encoding() in unicodeobject.c
2. normalizestring() in codecs.c
3. normalize_encoding() in encodings/__init__.py

Each performs a slightly different transformation and only the last one strips non-alphanumeric characters.

The complexity of codec lookup is comparable with that of the import mechanism!