gh-128110: Fix rfc2047 handling in email parser address headers by medmunds · Pull Request #130749 · python/cpython

medmunds

RFC 2047 Section 6.2 requires that "any 'linear-white-space' that
separates a pair of adjacent 'encoded-word's is ignored." The modern
header value parser correctly implements that for unstructured headers,
but had missed a case in structured headers. This could cause a parsed
address header to include extraneous spaces in a display-name.

Fixed in get_atom() by converting a trailing CFWSList token after an
encoded-word to an EWWhiteSpaceTerminal if another encoded-word follows.

Deliberately left similar code in get_dotatom() unmodified. A dotatom
can only appear within an addr-spec. RFC 2047 Section 5 prohibits
use of an encoded-word in any portion of an addr-spec, so its appearance
in a dotatom is invalid. Adding (and testing) special white-space
handling in an invalid dotatom seems an unnecessary complication.

Switch to @bitdancer's fix from review feedback. Recharacterize space
between ews as fws after parsing in get_phrase (rather than peeking
ahead after first ew in get_word).

Co-authored-by: R David Murray <rdmurray@bitdance.com>

bitdancer removed the stale

label

May 11, 2026

bitdancer added a commit that referenced this pull request

May 13, 2026

…ress headers (GH-130749) (#149787)

RFC 2047 Section 6.2 requires that "any 'linear-white-space' that
separates a pair of adjacent 'encoded-word's is ignored." The modern
header value parser correctly implements that for unstructured headers,
but had missed a case in structured headers. This could cause a parsed
address header to include extraneous spaces in a display-name.

Switch to @bitdancer's fix from review feedback. Recharacterize space
between ews as fws after parsing in get_phrase.

RDM: This fix is dependent on the fact that "subsequent" atoms will never have
leading whitespace because that's been consumed already. I don't think
it's worth adding extra code for the possibility of leading whitespace
because the parser won't produce it. It's a bit of parser fragility in the
face of code changes, but I think that's a minor concern given the
parser design (which is that it consumes whitespace greedily)
(cherry picked from commit 7a4c6df)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>

bitdancer added a commit that referenced this pull request

May 13, 2026

…ress headers (GH-130749) (#149788)

RFC 2047 Section 6.2 requires that "any 'linear-white-space' that
separates a pair of adjacent 'encoded-word's is ignored." The modern
header value parser correctly implements that for unstructured headers,
but had missed a case in structured headers. This could cause a parsed
address header to include extraneous spaces in a display-name.

Switch to @bitdancer's fix from review feedback. Recharacterize space
between ews as fws after parsing in get_phrase.

RDM: This fix is dependent on the fact that "subsequent" atoms will never have
leading whitespace because that's been consumed already. I don't think
it's worth adding extra code for the possibility of leading whitespace
because the parser won't produce it. It's a bit of parser fragility in the
face of code changes, but I think that's a minor concern given the
parser design (which is that it consumes whitespace greedily)
(cherry picked from commit 7a4c6df)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>

bitdancer added a commit that referenced this pull request

May 13, 2026

…ress headers (GH-130749) (#149789)

RFC 2047 Section 6.2 requires that "any 'linear-white-space' that
separates a pair of adjacent 'encoded-word's is ignored." The modern
header value parser correctly implements that for unstructured headers,
but had missed a case in structured headers. This could cause a parsed
address header to include extraneous spaces in a display-name.

Switch to @bitdancer's fix from review feedback. Recharacterize space
between ews as fws after parsing in get_phrase.

RDM: This fix is dependent on the fact that "subsequent" atoms will never have
leading whitespace because that's been consumed already. I don't think
it's worth adding extra code for the possibility of leading whitespace
because the parser won't produce it. It's a bit of parser fragility in the
face of code changes, but I think that's a minor concern given the
parser design (which is that it consumes whitespace greedily)
(cherry picked from commit 7a4c6df)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>