gh-121650 : detect newlines in headers by basbloemsaat · Pull Request #121812 · python/cpython
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to confirm that this PR fixes #121650. The original reproducer still contains the embedded newline:
from email import message_from_string from email.policy import default email_in = """\ To: incoming+tag@me.example.com From: External Sender <sender@them.example.com> Subject: Here's an =?UTF-8?Q?embedded_newline=0A?= Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 <html> <head><title>An embeded newline</title></head> <body> <p>I sent you an embedded newline in the subject. How do you like that?!</p> </body> </html> """ msg = message_from_string(email_in, policy=default) msg = message_from_string(email_in, policy=default) for header, value in msg.items(): del msg[header] msg[header] = value email_out = str(msg) print(email_out)
I wasn't able to confirm that this PR fixes #121650. The original reproducer still contains the embedded newline:
...
I missed headers that were parsed from a message, as in original issue. I updated the fix and the tests.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed that this fixes #121650. I'm pretty sure this is a security fix (as you could previously inject email headers using this method), so this should need a backport all the way to 3.8
encukou
previously approved these changes
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks good to me! Thank you for digging into it!
I take back the review. There's more to this, unfortunately :(
Here's another reproducer:
from email import message_from_string from email.policy import default email_in = """\ To: incoming+tag@me.example.com From: External Sender <sender@them.example.com> =?UTF-8?Q?embedded_newline=0A?=Smuggled-Data: Bad Subject: foo <bar> Here's an Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 <html> <head><title>An embeded newline</title></head> <body> <p>I sent you an embedded newline in the subject. How do you like that?!</p> </body> </html> """ msg = message_from_string(email_in, policy=default) print(msg) for header, value in msg.items(): del msg[header] msg[header] = value email_out = str(msg) print(email_out)
@encukou I tried all header types. This eliminates all newlines. Two notes:
- date headers (and derivatives) parse the date, and the offending code is eliminated that way
- the MIME-Version header doesn't decode the encoded newline, so it doesn't break the message. Improving the parsing of that header would mean rewriting the parsing code to do so, but I think that goes beyond the scope of this ticket.
After reading up on the email module, I propose to fix the issue in a different part of the code: see #122233.
Closing in favour of #122233.
Thank you for the work here, @basbloemsaat! And sorry that I “stole” the issue.
basbloemsaat
deleted the
fix-issue-121650-detect-newlines-in-headers
branch