[bpo-28414] Make all hostnames in SSL module IDN A-labels by tiran · Pull Request #5128 · python/cpython
tiran
mentioned this pull request
tiran
mentioned this pull request
Historically, our handling of international domain names (IDNs) in the
ssl module has been very broken. The flow went like:
1. User passes server_hostname= to the SSLSocket/SSLObject
constructor. This gets normalized to an A-label by using the
PyArg_Parse "et" mode: bytes objects get passed through
unchanged (assumed to already be A-labels); str objects get run
through .encode("idna") to convert them into A-labels.
2. newPySSLSocket takes this A-label, and for some reason decodes
it *back* to a U-label, and stores that as the object's
server_hostname attribute.
3. Later, this U-label server_hostname attribute gets passed to
match_hostname, to compare against the hostname seen in the
certificate. But certificates contain A-labels, and match_hostname
expects to be passed an A-label, so this doesn't work at all.
This PR fixes the problem by removing the pointless decoding at step
2, so that internally we always use A-labels, which matches how
internet protocols are designed in general: A-labels are used
everywhere internally and on-the-wire, and U-labels are basically just
for user interfaces.
This also matches the general advice to handle encoding/decoding once
at the edges, though for backwards-compatibility we continue to use
'str' objects to store A-labels, even though they're now always
ASCII. Technically there is a minor compatibility break here: if a
user examines the .server_hostname attribute of an ssl-wrapped socket,
then previously they would have seen a U-label like "pythön.org", and
now they'll see an A-label like "xn--pythn-mua.org". But this only
affects non-ASCII domain names, which have never worked in the first
place, so it seems unlikely that anyone is relying on the old
behavior.
This PR also adds an end-to-end test for IDN hostname
validation. Previously there were no tests for this functionality.
Fixes bpo-28414.
All test certs must be generated by CPython's own test helper. Signed-off-by: Christian Heimes <christian@python.org>
tiran
changed the title
[bpo-28414][WIP] Make all hostnames in SSL module IDN A-labels
[bpo-28414] Make all hostnames in SSL module IDN A-labels
Drop extra code for PEP 543 future compatibility in sni callback Use callable() and encode_hostname in shim for old SNI callback. Signed-off-by: Christian Heimes <christian@python.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
) Previously, the ssl module stored international domain names (IDNs) as U-labels. This is problematic for a number of reasons -- for example, it made it impossible for users to use a different version of IDNA than the one built into Python. After this change, we always convert to A-labels as soon as possible, and use them for all internal processing. In particular, server_hostname attribute is now an A-label, and on the server side there's a new sni_callback that receives the SNI servername as an A-label rather than a U-label. (cherry picked from commit 11a1493) Co-authored-by: Christian Heimes <christian@python.org>
njsmith pushed a commit that referenced this pull request
…H-5843) Previously, the ssl module stored international domain names (IDNs) as U-labels. This is problematic for a number of reasons -- for example, it made it impossible for users to use a different version of IDNA than the one built into Python. After this change, we always convert to A-labels as soon as possible, and use them for all internal processing. In particular, server_hostname attribute is now an A-label, and on the server side there's a new sni_callback that receives the SNI servername as an A-label rather than a U-label. (cherry picked from commit 11a1493) Co-authored-by: Christian Heimes <christian@python.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters