bpo-38043: Move unicodedata.normalize tests into test_unicodedata.#15712

gnprice

Having these in a separate file from the one that's named after the
module in the usual way makes it very easy to miss them when looking
for tests for these two functions.

(In fact when working recently on is_normalized, I'd been surprised to
see no tests for it here and concluded the function had evaded being
tested at all. I'd gone as far as to write up some tests myself
before I spotted this other file.)

Mostly this just means moving all the one file's code into the other,
and moving code from the module toplevel to inside the test class to
keep it tidily separate from the rest of the file's code.

There's one substantive change, which reduces by a bit the amount of
code to be moved: we drop the x > sys.maxunicode conditional and all
the RangeError logic behind it. Now if that condition ever occurs
it will cause an error at chr(x), and a test failure. That's the
right result because, since PEP 393 in Python 3.3, there is no longer
such a thing as an "unsupported character".

https://bugs.python.org/issue38043

Having these in a separate file from the one that's named after the module in the usual way makes it very easy to miss them when looking for tests for these two functions. (In fact when working recently on is_normalized, I'd been surprised to see no tests for it here and concluded the function had evaded being tested at all. I'd gone as far as to write up some tests myself before I spotted this other file.) Mostly this just means moving all the one file's code into the other, and moving code from the module toplevel to inside the test class to keep it tidily separate from the rest of the file's code. There's one substantive change, which reduces by a bit the amount of code to be moved: we drop the `x > sys.maxunicode` conditional and all the `RangeError` logic behind it. Now if that condition ever occurs it will cause an error at `chr(x)`, and a test failure. That's the right result because, since PEP 393 in Python 3.3, there is no longer such a thing as an "unsupported character".

aeros

Thanks for the PR @gnprice. As far as I'm aware, we typically don't require news entries for changes that exclusively involve the tests, so for now I've added the skip news label. I'll also CC the experts for unicodedata.

/cc @malemburg @ezio-melotti

aeros

I have a few minor suggestions that wouldn't be worth opening their own PR for, but might be worth addressing while we're moving the tests.

gnprice

Thanks @aeros167 !

I agree with most of your comments in that if I were writing this code new, or reviewing it when newly-written, I'd prefer the code be the way you suggest.

I've left things like a short local-variable name, and the couple of unnecessary small comments, the way they are because I've largely taken a policy of leaving the code unchanged wherever possible. That's because I think that's helpful in reducing the work needed to read and review the change.

Especially when moving a good-sized chunk of code around, if a reader wants to 100% check my work for themselves, they'll have to go comparing the deleted and added code bit by bit and spot where it differs -- then study each of those differences to understand what they do. (Tools like git diff --color-moved help in skimming past what didn't change, but when something did change you still have to study that.) In this case there were a few things that did change, as described in the PR description. But it helps if that's kept to a minimum.

gnprice

(also for ease of cross-reference here's where @benjaminp originally suggested this change: #15558 (comment) )

aeros

@gnprice:

I've left things like a short local-variable name, and the couple of unnecessary small comments, the way they are because I've largely taken a policy of leaving the code unchanged wherever possible. That's because I think that's helpful in reducing the work needed to read and review the change.

I can definitely understand the logic here, the less changes involved in a PR the easier it is to review. However, I think some of the minor refactoring is needed over time. If everyone were to follow the mentality of leaving the surrounding code as unchanged as possible, these minor issues would add up to be more significant over time.

I wouldn't mind attempting to apply these minor changes in a separate PR, but it can be incredibly difficult to find a core developer with the time to review those types of changes by themselves. From a review perspective, it's usually easier to group them in with other changes. That's why I usually try to suggest for several of them to be addressed when a PR happens to be modifying (or moving in this case) the surrounding code.

It's a fine balance though, and I certainly don't expect all of my suggestions to make it to the final merged PR. As long as some of them are addressed, it provides a gradual improvement in the long term.

benjaminp

I think it's fine to submit a PR that does the minimal move because it eases review and clarifies version history.

gnprice

but it can be incredibly difficult to find a core developer with the time to review those types of changes by themselves. From a review perspective, it's usually easier to group them in with other changes.

Yeah, you've put your finger on a real tension here.

Thing is, while it is easier to get small code cleanups in when they're squashed into some bigger, otherwise-independent change... I think it's actually more review work that way. (For one-line cleanups like deleting those unhelpful comments, the stakes are low all around, but I think this is clearer for cleanups that are just a little more complex -- and often also higher-value.)

So the current equilibrium of how people write and send PRs, how they review them, and how they merge them, naturally pushes you in that direction; but I think it's actually one of the contributing factors in the shortage we have of review/merge effort. The thing about being an equilibrium is that it's infeasible to totally opt out -- this PR already contained what would have been at least three separate changes if we were in a workflow equilibrium where many, small, focused changes was the norm -- but I try to lean in that direction.

Which also includes sending some small cleanups as PRs of their own. One such was def97c9 (in this same file, as it happens). One merged just today was 3cbc23a -- waited a few weeks, but it's just as good now as then, and I expect it was very little work for the reviewers/merger. If you run git log --stat -p --author=gnprice in your clone of the repo, you'll see more examples. Looking through the logs in general, there are others as small as 3aa48b8.

aeros

@benjaminp:

I think it's fine to submit a PR that does the minimal move because it eases review and clarifies version history.

Would you suggest having code comment cleanups in their own PR then? My main concern was that it would end up getting buried. Code comments would generally fall under documentation, and the currently active doc experts are stretched thin as is, in comparison to the volume of doc-only PRs.

In theory, any core dev could review doc PRs, but most tend to be focused on their areas of expertise/interest. This completely makes sense and I definitely don't blame anyone for the situation, but it makes it rather difficult to have them reviewed/merged.

This also applies to rather small PRs, see #14698 as an example. I currently have 3 doc-only PRs open and have several other ideas for ones that I'd like to open, but I wanted to avoid adding further to the list of doc PRs that are in need review from a core developer.

We could use some more active documentation experts, but that's definitely easier said than done. I would be interested in helping with that issue, but I think I need more experience and mentoring first. I only recently became a member of the Triage team last month and started actively contributing in June.

aeros

@gnprice:

Thing is, while it is easier to get small code cleanups in when they're squashed into some bigger, otherwise-independent change... I think it's actually more review work that way.

So the current equilibrium of how people write and send PRs, how they review them, and how they merge them, naturally pushes you in that direction; but I think it's actually one of the contributing factors in the shortage we have of review/merge effort

Hmm, interesting. I hadn't considered that but I think you might be correct. I think a significant part of my push in that direction has been the slower response times to documentation only PRs (which I detailed more in my comment above). I've tried to assist with alleviating this issue by focusing my efforts on PR review, but the main bottleneck seems to be on the final core dev review. Perhaps smaller and more specific PRs could help with that.

I'm uncertain though if I should try to leave the code cleanup and minor documentation PRs on the backlog for now while I wait for the others to be merged and continue to focus on review, or try to do more of a balance of both. Do you have any recommendations?

benjaminp

@benjaminp:

I think it's fine to submit a PR that does the minimal move because it eases review and clarifies version history.

Would you suggest having code comment cleanups in their own PR then? My main concern was that it would end up getting buried. Code comments would generally fall under documentation, and the currently active doc experts are stretched thin as is, in comparison to the volume of doc-only PRs.

I don't think this problem is specific to doc-only PRs. In general, review time ≪ amount of PRs to review.

aeros

@benjaminp:

I don't think this problem is specific to doc-only PRs. In general, review time ≪ amount of PRs to review.

Oh yeah definitely, it's not an exclusive issue to doc-only PRs, but from my experience thus far it seems to be a bit more of an issue for them than it is for most other PRs. At least from my personal experience, I usually see significantly faster response times from CCing other experts than I do when trying to CC the documentation experts. There are some other areas which have longer response times than the documentation experts, but that's usually due to a lack of an active expert, such as for argparse.

gnprice

Thanks @benjaminp for the merge!

At least from my personal experience, I usually see significantly faster response times from CCing other experts than I do when trying to CC the documentation experts.

@aeros167 One thought this brings to mind: when submitting an issue or PR in the documentation for some area, you might get a response more easily by CCing experts in that area rather than in documentation generally.

I'd expect documentation experts to be most able to help for things in the docs infrastructure (e.g. the stuff in Doc/tools/), and things that cut across large parts of the documentation. (Plus of course all the other things the same person happens to be an expert on.) On the other hand if someone's taking care of e.g. a given stdlib module, I think they'd typically see that module's docs as part of what they're taking care of and are well-prepared to review.

…ythonGH-15712) Having these in a separate file from the one that's named after the module in the usual way makes it very easy to miss them when looking for tests for these two functions. (In fact when working recently on is_normalized, I'd been surprised to see no tests for it here and concluded the function had evaded being tested at all. I'd gone as far as to write up some tests myself before I spotted this other file.) Mostly this just means moving all the one file's code into the other, and moving code from the module toplevel to inside the test class to keep it tidily separate from the rest of the file's code. There's one substantive change, which reduces by a bit the amount of code to be moved: we drop the `x > sys.maxunicode` conditional and all the `RangeError` logic behind it. Now if that condition ever occurs it will cause an error at `chr(x)`, and a test failure. That's the right result because, since PEP 393 in Python 3.3, there is no longer such a thing as an "unsupported character".

gnprice requested a review from a team as a code owner September 6, 2019 05:24

the-knights-who-say-ni added the CLA signed label Sep 6, 2019

bedevere-bot added the awaiting review label Sep 6, 2019

aeros added skip news tests Tests in the Lib/test dir labels Sep 7, 2019

aeros reviewed Sep 7, 2019

View reviewed changes

aeros suggested changes Sep 7, 2019

View reviewed changes

bedevere-bot added awaiting core review and removed awaiting review labels Sep 7, 2019

gnprice commented Sep 9, 2019

View reviewed changes

benjaminp reviewed Sep 9, 2019

View reviewed changes

Fold old test_normalize function in too, as test_edge_cases.

8803950

benjaminp merged commit 1ad0c77 into python:master Sep 10, 2019

bedevere-bot removed the awaiting core review label Sep 10, 2019

gnprice deleted the pr-normalize-test-file branch September 11, 2019 04:51

gnprice mentioned this pull request Apr 10, 2022

small cleanups in Unicode normalization code #82224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-38043: Move unicodedata.normalize tests into test_unicodedata.#15712

bpo-38043: Move unicodedata.normalize tests into test_unicodedata.#15712
benjaminp merged 2 commits into
python:masterfrom
gnprice:pr-normalize-test-file

gnprice commented Sep 6, 2019 •

edited by bedevere-bot

Loading

Uh oh!

aeros left a comment •

edited

Loading

Uh oh!

aeros left a comment

Uh oh!

gnprice left a comment

Uh oh!

gnprice commented Sep 9, 2019

Uh oh!

aeros commented Sep 9, 2019

Uh oh!

benjaminp left a comment

Uh oh!

gnprice commented Sep 10, 2019

Uh oh!

aeros commented Sep 10, 2019

Uh oh!

aeros commented Sep 10, 2019

Uh oh!

benjaminp commented Sep 10, 2019

Uh oh!

aeros commented Sep 10, 2019

Uh oh!

gnprice commented Sep 11, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

gnprice commented Sep 6, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aeros left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aeros left a comment

Choose a reason for hiding this comment

Uh oh!

gnprice left a comment

Choose a reason for hiding this comment

Uh oh!

gnprice commented Sep 9, 2019

Uh oh!

aeros commented Sep 9, 2019

Uh oh!

benjaminp left a comment

Choose a reason for hiding this comment

Uh oh!

gnprice commented Sep 10, 2019

Uh oh!

aeros commented Sep 10, 2019

Uh oh!

aeros commented Sep 10, 2019

Uh oh!

benjaminp commented Sep 10, 2019

Uh oh!

aeros commented Sep 10, 2019

Uh oh!

gnprice commented Sep 11, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gnprice commented Sep 6, 2019 •

edited by bedevere-bot

Loading

aeros left a comment •

edited

Loading