[bpo-28414] In SSL module, store server_hostname as an A-label#3010
[bpo-28414] In SSL module, store server_hostname as an A-label#3010njsmith wants to merge 3 commits into
Conversation
Lukasa
left a comment
There was a problem hiding this comment.
This generally looks reasonable to me. Of course, I don't have the commit bit, so my enthusiasm is almost worthless. 😉
Sorry, something went wrong.
Fine, fine, all right, done. (And why won't github let me reply to your comment? The reply box is just missing. It Is a Mystery 😕) |
Sorry, something went wrong.
|
I suggested the same solution in pyca/cryptography#3357 (comment) and in my WIP PEP. I'm glad you had the same idea. There is also the SNI callback. I like to change it to IDN A-Label, too. |
Sorry, something went wrong.
There was a problem hiding this comment.
This change is backwards incompatible but the best option we have. Please update the documentation of SSLSocket.server_hostname and state that the variable now contains the hostname in ACE instead of IDN U-label. The whatsnew doc for 3.7 needs an entry with a incompatibility warning, too.
Sorry, something went wrong.
|
@decaz AFAIK I'm waiting for @tiran or @alex to make the next move :-). I guess some conflicting change has been merged in the mean time. I don't understand the conflict in |
Sorry, something went wrong.
|
test_ssl has changed a lot because CPython now required threading. In the past code was indented one additional level. |
Sorry, something went wrong.
Historically, our handling of international domain names (IDNs) in the
ssl module has been very broken. The flow went like:
1. User passes server_hostname= to the SSLSocket/SSLObject
constructor. This gets normalized to an A-label by using the
PyArg_Parse "et" mode: bytes objects get passed through
unchanged (assumed to already be A-labels); str objects get run
through .encode("idna") to convert them into A-labels.
2. newPySSLSocket takes this A-label, and for some reason decodes
it *back* to a U-label, and stores that as the object's
server_hostname attribute.
3. Later, this U-label server_hostname attribute gets passed to
match_hostname, to compare against the hostname seen in the
certificate. But certificates contain A-labels, and match_hostname
expects to be passed an A-label, so this doesn't work at all.
This PR fixes the problem by removing the pointless decoding at step
2, so that internally we always use A-labels, which matches how
internet protocols are designed in general: A-labels are used
everywhere internally and on-the-wire, and U-labels are basically just
for user interfaces.
This also matches the general advice to handle encoding/decoding once
at the edges, though for backwards-compatibility we continue to use
'str' objects to store A-labels, even though they're now always
ASCII. Technically there is a minor compatibility break here: if a
user examines the .server_hostname attribute of an ssl-wrapped socket,
then previously they would have seen a U-label like "pythön.org", and
now they'll see an A-label like "xn--pythn-mua.org". But this only
affects non-ASCII domain names, which have never worked in the first
place, so it seems unlikely that anyone is relying on the old
behavior.
This PR also adds an end-to-end test for IDN hostname
validation. Previously there were no tests for this functionality.
Fixes bpo-28414.
|
Okay, I think I managed to rebase this. @tiran, want to hit merge? |
Sorry, something went wrong.
|
@tiran please review. |
Sorry, something went wrong.
|
@asvetlov I'll make sure a complete patch lands in 3.7. This is a) a breaking change and b) I like to fix all of the SSL module at once. The SNI callback is still broken. I'll drop a mail to python-dev as soon as I have some spare time. |
Sorry, something went wrong.
|
Thanks! |
Sorry, something went wrong.
|
The patch does not work as I expected. Shouldn't |
Sorry, something went wrong.
|
I just looked at the code again, and I can't see how that could happen...
Are you positive you're running the right branch?
…On Dec 30, 2017 5:30 AM, "Christian Heimes" ***@***.***> wrote:
The patch does not work as I expected. Shouldn't sock.server_hostname
return www.strasse.de instead of 'www.straße.de <http://www.strasse.de>'`?
>>> import ssl, socket
>>> ctx = ssl.create_default_context()
>>> sock = ctx.wrap_socket(socket.socket(), server_hostname='www.straße.de <http://www.strasse.de>')
>>> sock.server_hostname
'www.straße.de <http://www.strasse.de>'
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3010 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAlOaG6Zvr1nIbKGK0teXWevZsuG8VrUks5tFjrhgaJpZM4Ou2c2>
.
|
Sorry, something went wrong.
|
Absolutely, I'm running 948bc68 on branch pr/3010: |
Sorry, something went wrong.
|
Oh, blah, I see. It's a problem in the interaction between the Python and C level modules. I fixed the C level
Two options occur to me:
The latter is a larger change and riskier, but maybe also an independently valuable simplification? I'm looking through the code trying to figure out what would change if we eagerly allocated the Most or all of the socket methods have explicit checks like: "if This should be extremely rare. In practice, What else would change? I guess right now, you can wrap an unconnected socket, and then assign to the I guess right now you can wrap a UDP socket in an What do you think? |
Sorry, something went wrong.
|
The SSLSocket / SSLObject relationship became complicated over the years. I don't like the fact that SSLObject has so many levels of indirection, see https://bugs.python.org/issue24334 We could also go for a variation of your first idea and perform IDNA encoding in Python code only. In this variant, C code accepts just ASCII text or ASCII bytes. I've to think about it and will get back to you next year. |
Sorry, something went wrong.
|
I don't think doing it all in Python would be simpler. We'd still have to
do it twice (once for SSLObject and once for SSLSocket), and then we'd need
some additional hoop-jumping in the C code to make sure we were working
with an ascii-only Unicode string.
…On Dec 31, 2017 3:21 AM, "Christian Heimes" ***@***.***> wrote:
The SSLSocket / SSLObject relationship became complicated over the years.
I don't like the fact that SSLObject has so many levels of indirection, see
https://bugs.python.org/issue24334
We could also go for a variation of your first idea and perform IDNA
encoding in Python code *only*. In this variant, C code accepts just
ASCII text or ASCII bytes. I've to think about it and will get back to you
next year.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3010 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAlOaFmUPafjJKn3hXc8O_dP-H2XXcprks5tF24zgaJpZM4Ou2c2>
.
|
Sorry, something went wrong.
|
I've spent some time on the train to refactor code and to implement your second proposal. A first proof of concept implementation works mostly. I was able to get rid of SSLObject for socket-based I/O and to instantiate the SSL object directly. I'm now hitting a road block: |
Sorry, something went wrong.
|
Oh yeah, SSL shutdown is a whole other mess. It's not as simple as "the current behavior is buggy" – the semantics Python implements now are what HTTPS implementations universally use and expect, and if We definitely don't want to touch that in this branch, or in 3.7 – we should preserve the current behavior where (If you're curious, I did come up with some kind of solution to these issues in Trio, see the docs on |
Sorry, something went wrong.
|
We have to touch the close part in order to fix session handling. I haven't looked into the I/O part of SSLSocket yet ... I'm now starting to regret that I ever did. I understand why sessions are partly broken with OpenSSL 1.1.0. A forceful close of I/O without any proper shutdown invalidates the session. There are fourish ways to close a TLS connection
How do you feel about a |
Sorry, something went wrong.
|
@tiran Please make a separate issue/PR for any changes in close handling, it's orthogonal to IDNA support in server_hostname and deserves its own proper attention :-) For right now, maybe I should just do the quick hack version so this can go in, and then we'll worry about the refactoring separately? |
Sorry, something went wrong.
|
I have opened PR #5128, which addresses the issue of |
Sorry, something went wrong.
Historically, our handling of international domain names (IDNs) in the
ssl module has been very broken. The flow went like:
User passes server_hostname= to the SSLSocket/SSLObject
constructor. This gets normalized to an A-label by using the
PyArg_Parse "et" mode: bytes objects get passed through
unchanged (assumed to already be A-labels); str objects get run
through .encode("idna") to convert them into A-labels.
newPySSLSocket takes this A-label, and for some reason decodes
it back to a U-label, and stores that as the object's
server_hostname attribute.
Later, this U-label server_hostname attribute gets passed to
match_hostname, to compare against the hostname seen in the
certificate. But certificates contain A-labels, and match_hostname
expects to be passed an A-label, so this doesn't work at all.
This PR fixes the problem by removing the pointless decoding at step
2, so that internally we always use A-labels, which matches how
internet protocols are designed in general: A-labels are used
everywhere internally and on-the-wire, and U-labels are basically just
for user interfaces.
This also matches the general advice to handle encoding/decoding once
at the edges, though for backwards-compatibility we continue to use
'str' objects to store A-labels, even though they're now always
ASCII. Technically there is a minor compatibility break here: if a
user examines the .server_hostname attribute of an ssl-wrapped socket,
then previously they would have seen a U-label like "pythön.org", and
now they'll see an A-label like "xn--pythn-mua.org". But this only
affects non-ASCII domain names, which have never worked in the first
place, so it seems unlikely that anyone is relying on the old
behavior.
This PR also adds an end-to-end test for IDN hostname
validation. Previously there were no tests for this functionality.
Fixes bpo-28414.
https://bugs.python.org/issue28414