gh-83938, gh-122476: Stop incorrectly RFC 2047 encoding non-ASCII email addresses by medmunds · Pull Request #122540 · python/cpython
Email generators had been incorrectly flattening non-ASCII email addresses to RFC 2047 encoded-word format, leaving them undeliverable. (RFC 2047 prohibits use of encoded-word in an addr-spec.) This change raises a ValueError when attempting to flatten an EmailMessage with a non-ASCII addr-spec and a policy with utf8=False. (Exception: If the non-ASCII address originated from parsing a message, it will be flattened as originally parsed, without error.) Non-ASCII email addresses are supported when using a policy with utf8=True (such as email.policy.SMTPUTF8) under RFCs 6531 and 6532. Non-ASCII email address domains (but not localparts) can also be used with non-SMTPUTF8 policies by encoding the domain as an IDNA A-label. (The email package does not perform this encoding, because it cannot know whether the caller wants IDNA 2003, IDNA 2008, or some other variant such as UTS python#46.)
picnixz
changed the title
gh-83938: Stop incorrectly RFC 2047 encoding non-ASCII email addresses
gh-83938, gh-122476: Stop incorrectly RFC 2047 encoding non-ASCII email addresses
The mime parameter folder doesn't make use of the encoding check done be the code that is now below it, it does its own. So it makes more sense to take that branch first. This will simplify subsequent changes.
This is a more complete fix, covering any syntax part where encoded words are not permitted, and the doc changes are adjusted accordingly. There is also no need for a new exception, since HeaderWriteError already exists. The fix itself is to use a separate code loop to fold parts that may not have encoded words, guaranteeing that we do not do incorrect encoding. This opens a door to simplifying the main folding loop, but that is a much bigger refactoring job better left for another time.
bitdancer pushed a commit that referenced this pull request
…122477) The email.headerregistry.Address constructor raised an error if addr_spec contained a non-ASCII character. (But it fully supports non-ASCII in the separate username and domain args.) This change removes the error for a non-ASCII addr_spec, as well as the Defect that triggered it. In the unicode era non-ascii is not a defect, though it is an error when an attempt is made to serialize it to ascii. The serialization issue was handled in #122540.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters