◐ Shell
clean mode source ↗

gh-99593: Add tests for Unicode C API (part 2) by serhiy-storchaka · Pull Request #99868 · python/cpython

Add tests for lower-level functions.

This was referenced

Nov 29, 2022

vstinner

class CAPITest(unittest.TestCase):
# TODO: Test the following function:
#
# PyUnicode_ClearFreeList

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyUnicode_ClearFreeList was removed in Python 3.9!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, these tests were written when it was here.

self.assertEqual(new(0, maxchar), '')
self.assertEqual(new(5, maxchar), chr(maxchar)*5)
self.assertEqual(new(0, 0x110000), '')
self.assertRaises(SystemError, new, 5, 0x110000)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this error should become a ValueError, but it can be changed outside this PR, since you seem to want to backport these news tests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that SystemError is better exception type here. It is a misuse of the C API, you cannot get this error from Python code.

if (!result) {
return NULL;
}
if (size > 0 && maxchar <= 0x10ffff &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds dangerous to return a string to the "Python space" if characters are not initialized when maxchar is greated than 0x10ffff. Can you remove maxchar <= 0x10ffff condition? PyUnicode_Fill() must fail if the fill character is too big (greater than 0x10ffff).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a test for PyUnicode_New(), not for PyUnicode_Fill(). We should see exceptions raised by PyUnicode_New(), not PyUnicode_Fill().

It never happens, because PyUnicode_New() returns NULL if size > 0 and maxchar > 0x10ffff. If it will not return NULL, it is better to get a malformed string than get an exception raised by PyUnicode_Fill() and think that it was raised by PyUnicode_New().

}

result = PyUnicode_WriteChar(to_copy, index, (Py_UCS4)character);
if (result == -1 && PyErr_Occurred()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, -1 means an error. You shouldn't have to check if an exception was raised or not. Or it should become: assert(PyErr_Occurred()).

Same remark for other test wrapper like the one for PyUnicode_Resize().

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an exception was raised, but result is not -1, then what? The code will return NULL, and we will never know that something wrong happened with PyUnicode_WriteChar().

I do not use the C assert() in these tests, because I want to make tests working even in non-debug build.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not use the C assert() in these tests, because I want to make tests working even in non-debug build.

C assert() are enabled on release (no-debug) builds: see the following code in Modules/_testcapi/parts.h:

// Always enable assertions
#undef NDEBUG

Many _testcapi tests are only implemented with assert().


if (!PyUnicode_AsUCS4(unicode, buffer, buf_len, copy_null)) {
PyMem_Free(buffer);
PyMem_FREE(buffer);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyMem_FREE is a deprecated alias to PyMem_Free(). I would expect the opposite change, replace PyMem_FREE with PyMem_Free :-)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted. The code in my branch was written a long time ago, when PyMem_FREE and PyMem_Free were different things. I just copied it over the current code.

self.assertRaises(SystemError, append, 'abc', NULL)
# TODO: Test PyUnicode_Append() with modifiable unicode
# and with NULL as the address.
# TODO: Check reference counts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's done automatically by Refleaks buildbots, no?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. But since this C API does unusual things with reference counts, it would be better to test it explicitly, if possible.

if SIZEOF_WCHAR_T == 2:
if sys.byteorder == 'little':
encoding = 'utf-16le'
elif sys.byteorder == 'little':

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition looks wrong. Maybe just use else:? I don't think that Python supports other endianness. Same a few lines below.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I though that explicit check can be better, but made an error in it.

@serhiy-storchaka

Thank you for your review Victor.

vstinner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for my addressing my review.

}

result = PyUnicode_WriteChar(to_copy, index, (Py_UCS4)character);
if (result == -1 && PyErr_Occurred()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not use the C assert() in these tests, because I want to make tests working even in non-debug build.

C assert() are enabled on release (no-debug) builds: see the following code in Modules/_testcapi/parts.h:

// Always enable assertions
#undef NDEBUG

Many _testcapi tests are only implemented with assert().

@hugovk

@serhiy-storchaka Victor approved this, ready to merge?

(I removed the 3.10 backport label, it's now security only)

vstinner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vstinner

The CI test "DO-NOT-MERGE / unresolved review" was blocked on "Waiting for status to be reported". I tried to update the PR on main to see if it does unblock the PR.

@miss-islington

@miss-islington

Sorry, @serhiy-storchaka and @vstinner, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 2ba931ff727395cf89b290ed313a8e15db0bfcf1 3.11

@vstinner

Merged, thanks @serhiy-storchaka.

@serhiy-storchaka: If you consider that this change should be backported to Python 3.11, go ahead. But the automated backport failed for an unknown reason.

carljm added a commit to carljm/cpython that referenced this pull request

May 5, 2023

@miss-islington

@miss-islington

Sorry, @serhiy-storchaka and @vstinner, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 2ba931ff727395cf89b290ed313a8e15db0bfcf1 3.12