gh-92081: Fix for email.generator.Generator with whitespace between encoded words. by abadger · Pull Request #92281 · python/cpython
abadger
marked this pull request as ready for review
email.generator.Generator currently does not handle whitespace between encoded words correctly when the encoded words span multiple lines. The current generator will create an encoded word for each line. If the end of the line happens to correspond with the end real word in the plaintext, the generator will place an unencoded space at the start of the subsequent lines to represent the whitespace between the plaintext words. A compliant decoder will strip all the whitespace from between two encoded words which leads to missing spaces in the round-tripped output. The fix for this is to make sure that whitespace between two encoded words ends up inside of one or the other of the encoded words. This fix places the space inside of the second encoded word. A second problem happens with continuation lines. A continuation line that starts with whitespace and is followed by a non-encoded word is fine because the newline between such continuation lines is defined as condensing to a single space character. When the continuation line starts with whitespace followed by an encoded word, however, the RFCs specify that the word is run together with the encoded word on the previous line. This is because normal words are filded on syntactic breaks by encoded words are not. The solution to this is to add the whitespace to the start of the encoded word on the continuation line. Test cases are from python#92081
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
…ween encoded words. (pythonGH-92281) * Fix for email.generator.Generator with whitespace between encoded words. email.generator.Generator currently does not handle whitespace between encoded words correctly when the encoded words span multiple lines. The current generator will create an encoded word for each line. If the end of the line happens to correspond with the end real word in the plaintext, the generator will place an unencoded space at the start of the subsequent lines to represent the whitespace between the plaintext words. A compliant decoder will strip all the whitespace from between two encoded words which leads to missing spaces in the round-tripped output. The fix for this is to make sure that whitespace between two encoded words ends up inside of one or the other of the encoded words. This fix places the space inside of the second encoded word. A second problem happens with continuation lines. A continuation line that starts with whitespace and is followed by a non-encoded word is fine because the newline between such continuation lines is defined as condensing to a single space character. When the continuation line starts with whitespace followed by an encoded word, however, the RFCs specify that the word is run together with the encoded word on the previous line. This is because normal words are filded on syntactic breaks by encoded words are not. The solution to this is to add the whitespace to the start of the encoded word on the continuation line. Test cases are from pythonGH-92081 * Rename a variable so it's not confused with the final variable. (cherry picked from commit a6fdb31) Co-authored-by: Toshio Kuratomi <a.badger@gmail.com>
abadger
deleted the
email-bytes-generator-breakage
branch
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
…ween encoded words. (pythonGH-92281) * Fix for email.generator.Generator with whitespace between encoded words. email.generator.Generator currently does not handle whitespace between encoded words correctly when the encoded words span multiple lines. The current generator will create an encoded word for each line. If the end of the line happens to correspond with the end real word in the plaintext, the generator will place an unencoded space at the start of the subsequent lines to represent the whitespace between the plaintext words. A compliant decoder will strip all the whitespace from between two encoded words which leads to missing spaces in the round-tripped output. The fix for this is to make sure that whitespace between two encoded words ends up inside of one or the other of the encoded words. This fix places the space inside of the second encoded word. A second problem happens with continuation lines. A continuation line that starts with whitespace and is followed by a non-encoded word is fine because the newline between such continuation lines is defined as condensing to a single space character. When the continuation line starts with whitespace followed by an encoded word, however, the RFCs specify that the word is run together with the encoded word on the previous line. This is because normal words are filded on syntactic breaks by encoded words are not. The solution to this is to add the whitespace to the start of the encoded word on the continuation line. Test cases are from pythonGH-92081 * Rename a variable so it's not confused with the final variable. (cherry picked from commit a6fdb31) Co-authored-by: Toshio Kuratomi <a.badger@gmail.com>
warsaw pushed a commit that referenced this pull request
…tween encoded words. (GH-92281) (#119245) * Fix for email.generator.Generator with whitespace between encoded words. email.generator.Generator currently does not handle whitespace between encoded words correctly when the encoded words span multiple lines. The current generator will create an encoded word for each line. If the end of the line happens to correspond with the end real word in the plaintext, the generator will place an unencoded space at the start of the subsequent lines to represent the whitespace between the plaintext words. A compliant decoder will strip all the whitespace from between two encoded words which leads to missing spaces in the round-tripped output. The fix for this is to make sure that whitespace between two encoded words ends up inside of one or the other of the encoded words. This fix places the space inside of the second encoded word. A second problem happens with continuation lines. A continuation line that starts with whitespace and is followed by a non-encoded word is fine because the newline between such continuation lines is defined as condensing to a single space character. When the continuation line starts with whitespace followed by an encoded word, however, the RFCs specify that the word is run together with the encoded word on the previous line. This is because normal words are filded on syntactic breaks by encoded words are not. The solution to this is to add the whitespace to the start of the encoded word on the continuation line. Test cases are from GH-92081 * Rename a variable so it's not confused with the final variable. (cherry picked from commit a6fdb31) Co-authored-by: Toshio Kuratomi <a.badger@gmail.com>
warsaw pushed a commit that referenced this pull request
…tween encoded words. (GH-92281) (#119246) * Fix for email.generator.Generator with whitespace between encoded words. email.generator.Generator currently does not handle whitespace between encoded words correctly when the encoded words span multiple lines. The current generator will create an encoded word for each line. If the end of the line happens to correspond with the end real word in the plaintext, the generator will place an unencoded space at the start of the subsequent lines to represent the whitespace between the plaintext words. A compliant decoder will strip all the whitespace from between two encoded words which leads to missing spaces in the round-tripped output. The fix for this is to make sure that whitespace between two encoded words ends up inside of one or the other of the encoded words. This fix places the space inside of the second encoded word. A second problem happens with continuation lines. A continuation line that starts with whitespace and is followed by a non-encoded word is fine because the newline between such continuation lines is defined as condensing to a single space character. When the continuation line starts with whitespace followed by an encoded word, however, the RFCs specify that the word is run together with the encoded word on the previous line. This is because normal words are filded on syntactic breaks by encoded words are not. The solution to this is to add the whitespace to the start of the encoded word on the continuation line. Test cases are from GH-92081 * Rename a variable so it's not confused with the final variable. (cherry picked from commit a6fdb31) Co-authored-by: Toshio Kuratomi <a.badger@gmail.com>
estyxx pushed a commit to estyxx/cpython that referenced this pull request
…ween encoded words. (python#92281) * Fix for email.generator.Generator with whitespace between encoded words. email.generator.Generator currently does not handle whitespace between encoded words correctly when the encoded words span multiple lines. The current generator will create an encoded word for each line. If the end of the line happens to correspond with the end real word in the plaintext, the generator will place an unencoded space at the start of the subsequent lines to represent the whitespace between the plaintext words. A compliant decoder will strip all the whitespace from between two encoded words which leads to missing spaces in the round-tripped output. The fix for this is to make sure that whitespace between two encoded words ends up inside of one or the other of the encoded words. This fix places the space inside of the second encoded word. A second problem happens with continuation lines. A continuation line that starts with whitespace and is followed by a non-encoded word is fine because the newline between such continuation lines is defined as condensing to a single space character. When the continuation line starts with whitespace followed by an encoded word, however, the RFCs specify that the word is run together with the encoded word on the previous line. This is because normal words are filded on syntactic breaks by encoded words are not. The solution to this is to add the whitespace to the start of the encoded word on the continuation line. Test cases are from python#92081 * Rename a variable so it's not confused with the final variable.
bitdancer added a commit that referenced this pull request
) The fix for gh-92081 (gh-92281) was unfortunately flawed, and broke whitespace handling for encoded word patterns that had previously been working correctly but had no corresponding tests, unfortunately in a way that made the resulting headers not RFC compliant, in such a way that Yahoo started rejecting the resulting emails. This fix was released in 3.14 alpha 1, 3.13 beta 2 and 3.12.5. This PR fixes the original problem in a way that does not break anything, and in fact fixes a small pre-existing bug (a spurious whitespace after the ':' of the header label if the header value is immediately wrapped on to the next line). (RDM) Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: R. David Murray <rdmurray@bitdance.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
…pythonGH-144692) The fix for pythongh-92081 (pythongh-92281) was unfortunately flawed, and broke whitespace handling for encoded word patterns that had previously been working correctly but had no corresponding tests, unfortunately in a way that made the resulting headers not RFC compliant, in such a way that Yahoo started rejecting the resulting emails. This fix was released in 3.14 alpha 1, 3.13 beta 2 and 3.12.5. This PR fixes the original problem in a way that does not break anything, and in fact fixes a small pre-existing bug (a spurious whitespace after the ':' of the header label if the header value is immediately wrapped on to the next line). (RDM) (cherry picked from commit 0f7cd55) Co-authored-by: Robsdedude <dev@rouvenbauer.de> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: R. David Murray <rdmurray@bitdance.com>
bitdancer added a commit that referenced this pull request
GH-144692) (#145009) gh-144156: Fix email header folding concatenating encoded words (GH-144692) The fix for gh-92081 (gh-92281) was unfortunately flawed, and broke whitespace handling for encoded word patterns that had previously been working correctly but had no corresponding tests, unfortunately in a way that made the resulting headers not RFC compliant, in such a way that Yahoo started rejecting the resulting emails. This fix was released in 3.14 alpha 1, 3.13 beta 2 and 3.12.5. This PR fixes the original problem in a way that does not break anything, and in fact fixes a small pre-existing bug (a spurious whitespace after the ':' of the header label if the header value is immediately wrapped on to the next line). (RDM) (cherry picked from commit 0f7cd55) Co-authored-by: Robsdedude <dev@rouvenbauer.de> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: R. David Murray <rdmurray@bitdance.com>
bitdancer added a commit that referenced this pull request
GH-144692) (#145195) The fix for gh-92081 (gh-92281) was unfortunately flawed, and broke whitespace handling for encoded word patterns that had previously been working correctly but had no corresponding tests, unfortunately in a way that made the resulting headers not RFC compliant, in such a way that Yahoo started rejecting the resulting emails. This fix was released in 3.14 alpha 1, 3.13 beta 2 and 3.12.5. This PR fixes the original problem in a way that does not break anything, and in fact fixes a small pre-existing bug (a spurious whitespace after the ':' of the header label if the header value is immediately wrapped on to the next line). (RDM) (cherry picked from commit 0f7cd55) Co-authored-by: Robsdedude <dev@rouvenbauer.de> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
brijkapadia pushed a commit to brijkapadia/cpython that referenced this pull request
…python#144692) The fix for pythongh-92081 (pythongh-92281) was unfortunately flawed, and broke whitespace handling for encoded word patterns that had previously been working correctly but had no corresponding tests, unfortunately in a way that made the resulting headers not RFC compliant, in such a way that Yahoo started rejecting the resulting emails. This fix was released in 3.14 alpha 1, 3.13 beta 2 and 3.12.5. This PR fixes the original problem in a way that does not break anything, and in fact fixes a small pre-existing bug (a spurious whitespace after the ':' of the header label if the header value is immediately wrapped on to the next line). (RDM) Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: R. David Murray <rdmurray@bitdance.com>
ljfp pushed a commit to ljfp/cpython that referenced this pull request
…python#144692) The fix for pythongh-92081 (pythongh-92281) was unfortunately flawed, and broke whitespace handling for encoded word patterns that had previously been working correctly but had no corresponding tests, unfortunately in a way that made the resulting headers not RFC compliant, in such a way that Yahoo started rejecting the resulting emails. This fix was released in 3.14 alpha 1, 3.13 beta 2 and 3.12.5. This PR fixes the original problem in a way that does not break anything, and in fact fixes a small pre-existing bug (a spurious whitespace after the ':' of the header label if the header value is immediately wrapped on to the next line). (RDM) Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: R. David Murray <rdmurray@bitdance.com>