◐ Shell
clean mode source ↗

Apply titlecase mapping in str.title() for uppercase digraphs by changjoon-park · Pull Request #7748 · RustPython/RustPython

@changjoon-park

The uppercase/titlecase branch of PyStr::title() pushed characters
unchanged when starting a new word, which left Latin Extended-B
digraphs (U+01F1 'DZ', U+01C4 'DŽ', etc.) in their uppercase form
instead of mapping them to their distinct titlecase counterparts
(U+01F2 'Dz', U+01C5 'Dž'). For ASCII letters and characters where
to_titlecase is identity this had no effect, hiding the bug for the
common case.

Mirror the lowercase branch — which already calls to_titlecase()
when starting a new word — so both branches symmetrically apply
the titlecase mapping. char::to_titlecase is identity for already-
titlecase and ASCII-uppercase characters, so existing cases stay
correct.

Also unmasks test_unicodedata.UnicodeMiscTest.test_bug_4971, which
asserts exactly this behavior (`'DŽ'.title() == 'Dž'` etc.)
and was marked expectedFailure with reason `+ Dž`.

Closes RustPython#7527 (the only example from that issue still failing on
3.14.4; the other four examples already pass on current main).