Inconsistent handling of non-ASCII characters in encodings.normalize_encoding()
Bug report
#83518 changed handling of non-ASCII characters in encodings.normalize_encoding(), but it is still inconsistent with codecs.lookup(), and not even self-consistent. For example:
>>> import encodings >>> encodings.normalize_encoding('a¤b') 'a_b' >>> encodings.normalize_encoding('aæb') 'ab' >>> encodings.normalize_encoding('a-¤') 'a' >>> encodings.normalize_encoding('a-æ') 'a_' >>> encodings.normalize_encoding('a-¤-b') 'a_b' >>> encodings.normalize_encoding('a-æ-b') 'a__b'
You can even get an underscore at the end or repeated underscores in the middle.
cc @malemburg, @vstinner, @shihai1991