gh-99593: Add tests for Unicode C API (part 1) by serhiy-storchaka · Pull Request #99651 · python/cpython
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Here is my first review :-)
| self.assertRaises(ValueError, split, 'a|b|c|d', '') | ||
| self.assertRaises(TypeError, split, 'a|b|c|d', ord('|')) | ||
| self.assertRaises(TypeError, split, [], '|') | ||
| # split(NULL, '|') |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this comment stand for? Does the function crash with NULL? Same question for similar rsplit() comment below.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It crashes. It was the first test written by me 4 years ago, before I lost my sign, so I missed to add word CRASHES here.
| self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d') | ||
| self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г') | ||
| self.assertEqual(translate('abc', []), 'abc') | ||
| self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None}) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc is wrong.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. The surprising part is that str.translate() treats None as "delete:
>>> "abc".translate(str.maketrans({'b': None}))
'ac'
Well, it would be nice to update the doc (maybe in a separated PR).
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because str.translate calls PyUnicode_Translate() with the error handler "ignore".
| #for str in "\xa1", "\u8000\u8080", "\ud800\udc02", "\U0001f100\U0001f1f1": | ||
| #for i, ch in enumerate(str): | ||
| #self.assertEqual(tailmatch(str, ch, 0, len(str), 1), i) | ||
| #self.assertEqual(tailmatch(str, ch, 0, len(str), -1), i) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this code commented? if it is meaningless for tailmatch, just remove it?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied it from other tests (for find/index/count), but did not adapted it to tailmatch yet. I think it is easier to remove it now.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review Victor. I have a problem with reviewing such large volume of code, especially if many lines looks similar, so I can easily miss some types of errors. Without your help I would not find them.
| self.assertRaises(ValueError, split, 'a|b|c|d', '') | ||
| self.assertRaises(TypeError, split, 'a|b|c|d', ord('|')) | ||
| self.assertRaises(TypeError, split, [], '|') | ||
| # split(NULL, '|') |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It crashes. It was the first test written by me 4 years ago, before I lost my sign, so I missed to add word CRASHES here.
| self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d') | ||
| self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г') | ||
| self.assertEqual(translate('abc', []), 'abc') | ||
| self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None}) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc is wrong.
| #for str in "\xa1", "\u8000\u8080", "\ud800\udc02", "\U0001f100\U0001f1f1": | ||
| #for i, ch in enumerate(str): | ||
| #self.assertEqual(tailmatch(str, ch, 0, len(str), 1), i) | ||
| #self.assertEqual(tailmatch(str, ch, 0, len(str), -1), i) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied it from other tests (for find/index/count), but did not adapted it to tailmatch yet. I think it is easier to remove it now.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d') | ||
| self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г') | ||
| self.assertEqual(translate('abc', []), 'abc') | ||
| self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None}) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. The surprising part is that str.translate() treats None as "delete:
>>> "abc".translate(str.maketrans({'b': None}))
'ac'
Well, it would be nice to update the doc (maybe in a separated PR).
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!
Sorry, @serhiy-storchaka, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker deaa8dee48beeae9928a418736da0608f2f18361 3.11
Sorry @serhiy-storchaka, I had trouble checking out the 3.10 backport branch.
Please retry by removing and re-adding the "needs backport to 3.10" label.
Alternatively, you can backport using cherry_picker on the command line.
cherry_picker deaa8dee48beeae9928a418736da0608f2f18361 3.10
Oh, I didn't notice that you want to backport these tests to Python 3.10 and 3.11. You're motivated :-) If it's too complicated, maybe just add them to Python 3.12, no? _testcapi changed a lot since Python 3.11 (splited into multiple files).
I think that we should backport as many tests as possible, otherwise we risk to miss a regression introduced before the particular test was added. Especially if we do so many changes in C API.
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖
Sorry @serhiy-storchaka, I had trouble checking out the 3.11 backport branch.
Please retry by removing and re-adding the "needs backport to 3.11" label.
Alternatively, you can backport using cherry_picker on the command line.
cherry_picker deaa8dee48beeae9928a418736da0608f2f18361 3.11