◐ Shell
clean mode source ↗

gh-121650 : detect newlines in headers by basbloemsaat · Pull Request #121812 · python/cpython

ZeroIntensity

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to confirm that this PR fixes #121650. The original reproducer still contains the embedded newline:

from email import message_from_string
from email.policy import default

email_in = """\
To: incoming+tag@me.example.com
From: External Sender <sender@them.example.com>
Subject: Here's an =?UTF-8?Q?embedded_newline=0A?=
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

<html>
<head><title>An embeded newline</title></head>
<body>
  <p>I sent you an embedded newline in the subject. How do you like that?!</p>
</body>
</html>
"""

msg = message_from_string(email_in, policy=default)
msg = message_from_string(email_in, policy=default)
for header, value in msg.items():
    del msg[header]
    msg[header] = value
email_out = str(msg)
print(email_out)
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
…0FkCh.rst

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>

@basbloemsaat

I wasn't able to confirm that this PR fixes #121650. The original reproducer still contains the embedded newline:
...

I missed headers that were parsed from a message, as in original issue. I updated the fix and the tests.

ZeroIntensity

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that this fixes #121650. I'm pretty sure this is a security fix (as you could previously inject email headers using this method), so this should need a backport all the way to 3.8

@encukou

encukou

encukou previously approved these changes Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good to me! Thank you for digging into it!

@encukou

I take back the review. There's more to this, unfortunately :(

Here's another reproducer:

from email import message_from_string
from email.policy import default

email_in = """\
To: incoming+tag@me.example.com
From: External Sender <sender@them.example.com>  =?UTF-8?Q?embedded_newline=0A?=Smuggled-Data: Bad
Subject: foo <bar> Here's an
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

<html>
<head><title>An embeded newline</title></head>
<body>
  <p>I sent you an embedded newline in the subject. How do you like that?!</p>
</body>
</html>
"""

msg = message_from_string(email_in, policy=default)
print(msg)
for header, value in msg.items():
    del msg[header]
    msg[header] = value
email_out = str(msg)
print(email_out)

@basbloemsaat

I take back the review. There's more to this, unfortunately :(

I'll look into this...

…com:basbloemsaat/cpython into fix-issue-121650-detect-newlines-in-headers

@basbloemsaat

@encukou I tried all header types. This eliminates all newlines. Two notes:

  • date headers (and derivatives) parse the date, and the offending code is eliminated that way
  • the MIME-Version header doesn't decode the encoded newline, so it doesn't break the message. Improving the parsing of that header would mean rewriting the parsing code to do so, but I think that goes beyond the scope of this ticket.

@encukou

After reading up on the email module, I propose to fix the issue in a different part of the code: see #122233.

@encukou

Closing in favour of #122233.
Thank you for the work here, @basbloemsaat! And sorry that I “stole” the issue.

@basbloemsaat basbloemsaat deleted the fix-issue-121650-detect-newlines-in-headers branch

August 1, 2024 15:06