◐ Shell
clean mode source ↗

Disagreement with CPython about which unicode characters are alnum

Summary

The regex module interprets \w as matching more characters than it seems like it should (certainly more than CPython does).

Expected

import re

assert not re.match(r"\w", "\u0345"), r"\w should not match U+0345 (category Mn)"

This assertion should pass, but instead it fails.

Python Documentation

https://docs.python.org/3/library/re.html

This says that it should match anything for which isalnum() is true. This turns out to be also an area where RustPython and CPython disagree. "\u0345".isalnum() returns False on CPython and True on RustPython.