Historically, our handling of international domain names (IDNs) in the
ssl module has been very broken. The flow went like:
1. User passes server_hostname= to the SSLSocket/SSLObject
constructor. This gets normalized to an A-label by using the
PyArg_Parse "et" mode: bytes objects get passed through
unchanged (assumed to already be A-labels); str objects get run
through .encode("idna") to convert them into A-labels.
2. newPySSLSocket takes this A-label, and for some reason decodes
it *back* to a U-label, and stores that as the object's
server_hostname attribute.
3. Later, this U-label server_hostname attribute gets passed to
match_hostname, to compare against the hostname seen in the
certificate. But certificates contain A-labels, and match_hostname
expects to be passed an A-label, so this doesn't work at all.
This PR fixes the problem by removing the pointless decoding at step
2, so that internally we always use A-labels, which matches how
internet protocols are designed in general: A-labels are used
everywhere internally and on-the-wire, and U-labels are basically just
for user interfaces.
This also matches the general advice to handle encoding/decoding once
at the edges, though for backwards-compatibility we continue to use
'str' objects to store A-labels, even though they're now always
ASCII. Technically there is a minor compatibility break here: if a
user examines the .server_hostname attribute of an ssl-wrapped socket,
then previously they would have seen a U-label like "pythön.org", and
now they'll see an A-label like "xn--pythn-mua.org". But this only
affects non-ASCII domain names, which have never worked in the first
place, so it seems unlikely that anyone is relying on the old
behavior.
This PR also adds an end-to-end test for IDN hostname
validation. Previously there were no tests for this functionality.
Fixes bpo-28414.