gh-99593: Add tests for Unicode C API (part 2) by serhiy-storchaka · Pull Request #99868 · python/cpython
This was referenced
| class CAPITest(unittest.TestCase): | ||
| # TODO: Test the following function: | ||
| # | ||
| # PyUnicode_ClearFreeList |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyUnicode_ClearFreeList was removed in Python 3.9!
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, these tests were written when it was here.
| self.assertEqual(new(0, maxchar), '') | ||
| self.assertEqual(new(5, maxchar), chr(maxchar)*5) | ||
| self.assertEqual(new(0, 0x110000), '') | ||
| self.assertRaises(SystemError, new, 5, 0x110000) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this error should become a ValueError, but it can be changed outside this PR, since you seem to want to backport these news tests.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that SystemError is better exception type here. It is a misuse of the C API, you cannot get this error from Python code.
| if (!result) { | ||
| return NULL; | ||
| } | ||
| if (size > 0 && maxchar <= 0x10ffff && |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds dangerous to return a string to the "Python space" if characters are not initialized when maxchar is greated than 0x10ffff. Can you remove maxchar <= 0x10ffff condition? PyUnicode_Fill() must fail if the fill character is too big (greater than 0x10ffff).
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a test for PyUnicode_New(), not for PyUnicode_Fill(). We should see exceptions raised by PyUnicode_New(), not PyUnicode_Fill().
It never happens, because PyUnicode_New() returns NULL if size > 0 and maxchar > 0x10ffff. If it will not return NULL, it is better to get a malformed string than get an exception raised by PyUnicode_Fill() and think that it was raised by PyUnicode_New().
| } | ||
|
|
||
| result = PyUnicode_WriteChar(to_copy, index, (Py_UCS4)character); | ||
| if (result == -1 && PyErr_Occurred()) { |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, -1 means an error. You shouldn't have to check if an exception was raised or not. Or it should become: assert(PyErr_Occurred()).
Same remark for other test wrapper like the one for PyUnicode_Resize().
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an exception was raised, but result is not -1, then what? The code will return NULL, and we will never know that something wrong happened with PyUnicode_WriteChar().
I do not use the C assert() in these tests, because I want to make tests working even in non-debug build.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not use the C assert() in these tests, because I want to make tests working even in non-debug build.
C assert() are enabled on release (no-debug) builds: see the following code in Modules/_testcapi/parts.h:
// Always enable assertions
#undef NDEBUG
Many _testcapi tests are only implemented with assert().
|
|
||
| if (!PyUnicode_AsUCS4(unicode, buffer, buf_len, copy_null)) { | ||
| PyMem_Free(buffer); | ||
| PyMem_FREE(buffer); |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyMem_FREE is a deprecated alias to PyMem_Free(). I would expect the opposite change, replace PyMem_FREE with PyMem_Free :-)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted. The code in my branch was written a long time ago, when PyMem_FREE and PyMem_Free were different things. I just copied it over the current code.
| self.assertRaises(SystemError, append, 'abc', NULL) | ||
| # TODO: Test PyUnicode_Append() with modifiable unicode | ||
| # and with NULL as the address. | ||
| # TODO: Check reference counts. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's done automatically by Refleaks buildbots, no?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. But since this C API does unusual things with reference counts, it would be better to test it explicitly, if possible.
| if SIZEOF_WCHAR_T == 2: | ||
| if sys.byteorder == 'little': | ||
| encoding = 'utf-16le' | ||
| elif sys.byteorder == 'little': |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition looks wrong. Maybe just use else:? I don't think that Python supports other endianness. Same a few lines below.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I though that explicit check can be better, but made an error in it.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for my addressing my review.
| } | ||
|
|
||
| result = PyUnicode_WriteChar(to_copy, index, (Py_UCS4)character); | ||
| if (result == -1 && PyErr_Occurred()) { |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not use the C assert() in these tests, because I want to make tests working even in non-debug build.
C assert() are enabled on release (no-debug) builds: see the following code in Modules/_testcapi/parts.h:
// Always enable assertions
#undef NDEBUG
Many _testcapi tests are only implemented with assert().
@serhiy-storchaka Victor approved this, ready to merge?
(I removed the 3.10 backport label, it's now security only)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The CI test "DO-NOT-MERGE / unresolved review" was blocked on "Waiting for status to be reported". I tried to update the PR on main to see if it does unblock the PR.
Sorry, @serhiy-storchaka and @vstinner, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 2ba931ff727395cf89b290ed313a8e15db0bfcf1 3.11
Merged, thanks @serhiy-storchaka.
@serhiy-storchaka: If you consider that this change should be backported to Python 3.11, go ahead. But the automated backport failed for an unknown reason.
Sorry, @serhiy-storchaka and @vstinner, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 2ba931ff727395cf89b290ed313a8e15db0bfcf1 3.12