bpo-37751: Fix normalizestring() with hyphens and spaces converted to underscores#15092

fusioncid

Fix normalizestring() with hyphens and spaces converted to underscores
Reuse _Py_normalize_encoding() in normalizestring()

https://bugs.python.org/issue37751

…underscores

the-knights-who-say-ni

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

shihai1991

All the check have been passed, so LGTM.

vstinner

Would it be possible to reuse _Py_normalize_encoding() in codecs.c normalizestring()?

fusioncid

Would it be possible to reuse _Py_normalize_encoding() in codecs.c normalizestring()?

I went through two functions and found that they do have similar functions. At the same time, I tried to modify the code and the test cases passed.

The code can be modified just like 'check_force_ascii‘ calls ‘_Py_normalize_encoding‘ . And I think it's better to call _Py_normalize_encoding as an external function to other modules.

There is also a process problem. I am not quite sure whether to submit another issue to discuss this modification, or can I directly modify it in this issue?

fusioncid

Would it be possible to reuse _Py_normalize_encoding() in codecs.c normalizestring()?

I will do more test for reusing _Py_normalize_encoding() in codecs.c normalizestring(). Thank you for your helpful suggestions。

vstinner

You can create a new PR, this one can be closed when the new one is merged.

fusioncid

You can create a new PR, this one can be closed when the new one is merged.

Let me try it.

…underscores

bedevere-bot

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

I think use a regular exception is better. Thanks. Co-Authored-By: Victor Stinner <vstinner@redhat.com>

vstinner

Can you please try to add a NEWS entry using the blurb tool? (install it using: python3 -m pip install --user blurb)

…underscores

…into fix-issue-37751

fusioncid

Can you please try to add a NEWS entry using the blurb tool? (install it using: python3 -m pip install --user blurb)

Thank you for your careful guidance. I need to take a moment to familiarize myself with the blurb tool.

fusioncid

I have made the requested changes; please review again.

bedevere-bot

Thanks for making the requested changes!

@vstinner: please review the changes made to this pull request.

vstinner

LGTM. Thanks for the update.

Fix codecs.lookup() to normalize the encoding name the same way than encodings.normalize_encoding(), except that codecs.lookup() also converts the name to lower case.

* Fix running with Python 3.9 Since Python 3.9 [1], codecs names are normalized in a different way. [1] python/cpython#15092 * Add: github action, bump dependencies Co-authored-by: eight04 <eight04@gmail.com>

The codecs lookup function now performs only minimal normalization of the encoding name before passing it to the serach functions: all ASCII letters are converted to lower case, spaces are replaced with hyphens. Excessive normalization broke third-party codecs providers, like python-iconv. Revert "bpo-37751: Fix codecs.lookup() normalization (pythonGH-15092)" This reverts commit 20f59fe.

The codecs lookup function now performs only minimal normalization of the encoding name before passing it to the search functions: all ASCII letters are converted to lower case, spaces are replaced with hyphens. Excessive normalization broke third-party codecs providers, like python-iconv. Revert "bpo-37751: Fix codecs.lookup() normalization (pythonGH-15092)" This reverts commit 20f59fe.

The codecs lookup function now performs only minimal normalization of the encoding name before passing it to the search functions: all ASCII letters are converted to lower case, spaces are replaced with hyphens. Excessive normalization broke third-party codecs providers, like python-iconv. Revert "bpo-37751: Fix codecs.lookup() normalization (GH-15092)" This reverts commit 20f59fe.

…H-137167) The codecs lookup function now performs only minimal normalization of the encoding name before passing it to the search functions: all ASCII letters are converted to lower case, spaces are replaced with hyphens. Excessive normalization broke third-party codecs providers, like python-iconv. Revert "bpo-37751: Fix codecs.lookup() normalization (pythonGH-15092)" This reverts commit 20f59fe.

bpo-37751:Fix normalizestring() with hyphens and spaces converted to … …

95c6408

…underscores

the-knights-who-say-ni added the CLA not signed label Aug 3, 2019

bedevere-bot added the awaiting review label Aug 3, 2019

shihai1991 approved these changes Aug 3, 2019

View reviewed changes

bedevere-bot added awaiting core review and removed awaiting review labels Aug 3, 2019

the-knights-who-say-ni added CLA signed and removed CLA not signed labels Aug 5, 2019

fusioncid and others added 3 commits August 17, 2019 18:54

bpo-37751:Fix normalizestring() with hyphens and spaces converted to … …

e7eafdb

…underscores

bpo-37751:Fix normalizestring() with hyphens and spaces converted to … …

ba5138f

…underscores

delete cscope.out

f3eb230

fusioncid changed the title ~~bpo-37751: Fix normalizestring() with hyphens and spaces converted to…~~ Aug 17, 2019

vstinner requested changes Aug 19, 2019

View reviewed changes

bedevere-bot removed the awaiting core review label Aug 19, 2019

bedevere-bot added the awaiting changes label Aug 19, 2019

Update Python/codecs.c …

e51724b

I think use a regular exception is better. Thanks. Co-Authored-By: Victor Stinner <vstinner@redhat.com>

fusioncid added 2 commits August 19, 2019 21:14

bpo-37751:Fix normalizestring() with hyphens and spaces converted to … …

05eb71f

…underscores

Merge branch 'fix-issue-37751' of https://github.com/qigangxu/cpython … …

07b500d

…into fix-issue-37751

📜🤖 Added by blurb_it.

2396200

bedevere-bot removed the awaiting changes label Aug 20, 2019

bedevere-bot added the awaiting change review label Aug 20, 2019

vstinner reviewed Aug 20, 2019

View reviewed changes

bpo-37751: Optimize NEWS.d doc for normalizestring() fixing

6178176

vstinner approved these changes Aug 21, 2019

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting change review labels Aug 21, 2019

vstinner merged commit 20f59fe into python:master Aug 21, 2019

bedevere-bot removed the awaiting merge label Aug 21, 2019

fusioncid deleted the fix-issue-37751 branch August 21, 2019 14:24

yan12125 mentioned this pull request Dec 2, 2020

Fix running with Python 3.9 eight04/pyUAO#1

Merged

serhiy-storchaka mentioned this pull request Jul 28, 2025

gh-88886: Remove excessive encoding name normalization #137167

Merged

Conversation

fusioncid commented Aug 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

the-knights-who-say-ni commented Aug 3, 2019

Uh oh!

shihai1991 left a comment

Choose a reason for hiding this comment

Uh oh!

vstinner commented Aug 12, 2019

Uh oh!

fusioncid commented Aug 14, 2019

Uh oh!

fusioncid commented Aug 14, 2019

Uh oh!

vstinner commented Aug 14, 2019

Uh oh!

fusioncid commented Aug 15, 2019

Uh oh!

bedevere-bot commented Aug 19, 2019

Uh oh!

vstinner commented Aug 19, 2019

Uh oh!

fusioncid commented Aug 19, 2019

Uh oh!

fusioncid commented Aug 20, 2019

Uh oh!

bedevere-bot commented Aug 20, 2019

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fusioncid commented Aug 3, 2019 •

edited

Loading