◐ Shell
clean mode source ↗

Message 81115 - Python tracker

> I must be missing some detail, but what does the Unicode database
> have to do with the unicodeobject.c C API ?

Ah, now I understand your concerns. My suggestion is to change only the 20 functions in 
unicodectype.c: _PyUnicode_IsAlpha, _PyUnicode_ToLowercase... and no change in 
unicodeobject.c at all.
They all take a single code point as argument, some also return a single code point.
Changing these functions is backwards compatible.

I join a patch so we can argue on concrete code (tests are missing).

Another effect of the patch: unicodedata.numeric('\N{AEGEAN NUMBER TWO}') can return 2.0.

The str.isalpha() (and others) methods did not change: they still split the surrogate pairs.