◐ Shell
clean mode source ↗

Use Unicode properties for alnum, alpha, etc. by joshuamegnauth54 · Pull Request #7626 · RustPython/RustPython

959-979: Prefer matches! over array .contains(&..) for small enum sets.

[NumericType::Decimal, NumericType::Digit, NumericType::Numeric].contains(&NumericType::for_char(c)) constructs a stack array and does a linear search on every character. matches! is more idiomatic, reads better, and the compiler lowers it to a single jump table / bitwise test. Same applies to isdigit on Line 977.

Also: the lingering comment on Line 974 ("python's isdigit also checks if exponents are digits, these are the unicode codepoints for exponents") referred to the old hardcoded superscript list; with NumericType::Digit handling those uniformly, the tail of that sentence is stale and could be trimmed.

♻️ Proposed cleanup
     #[pymethod]
     fn isnumeric(&self) -> bool {
         !self.data.is_empty()
             && self.char_all(|c| {
-                [
-                    NumericType::Decimal,
-                    NumericType::Digit,
-                    NumericType::Numeric,
-                ]
-                .contains(&NumericType::for_char(c))
+                matches!(
+                    NumericType::for_char(c),
+                    NumericType::Decimal | NumericType::Digit | NumericType::Numeric
+                )
             })
     }

     #[pymethod]
     fn isdigit(&self) -> bool {
-        // python's isdigit also checks if exponents are digits, these are the unicode codepoints for exponents
+        // Matches CPython: Numeric_Type=Digit or Numeric_Type=Decimal (covers superscripts, etc.).
         !self.data.is_empty()
             && self.char_all(|c| {
-                [NumericType::Digit, NumericType::Decimal].contains(&NumericType::for_char(c))
+                matches!(
+                    NumericType::for_char(c),
+                    NumericType::Digit | NumericType::Decimal
+                )
             })
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/str.rs` around lines 959 - 979, Replace the
small-array-based enum membership checks in the isnumeric and isdigit methods
with matches! pattern matching to avoid allocating/linear-search semantics: in
isnumeric (fn isnumeric) change the closure that currently uses
[NumericType::Decimal, NumericType::Digit,
NumericType::Numeric].contains(&NumericType::for_char(c)) to a matches! on
NumericType::for_char(c) matching Decimal | Digit | Numeric; similarly in
isdigit (fn isdigit) replace the [NumericType::Digit,
NumericType::Decimal].contains(&NumericType::for_char(c)) check with
matches!(NumericType::for_char(c), Digit | Decimal). Also trim the stale portion
of the comment inside isdigit that mentions explicit superscript codepoints so
the comment only documents that isdigit checks digit/exponent categories if
kept.

948-957: Consider consolidating isalnum to use GeneralCategoryGroup::Number for consistency.

GeneralCategoryGroup::Letter ∪ DecimalNumber + c.is_numeric() correctly covers L*, Nd, Nl, and No (matching CPython's requirements). However, this method mixes ICU and Rust stdlib classification, while sibling methods (isalpha, isdecimal, isnumeric) use ICU exclusively. GeneralCategoryGroup::Number (available in icu_properties v2) expands to exactly Nd | Nl | No, making the refactor below functionally equivalent while consolidating to a single source of truth:

♻️ Optional simplification
 fn isalnum(&self) -> bool {
     !self.data.is_empty()
         && self.char_all(|c| {
             GeneralCategoryGroup::Letter
-                .union(GeneralCategoryGroup::DecimalNumber)
+                .union(GeneralCategoryGroup::Number)
                 .contains(GeneralCategory::for_char(c))
-                || c.is_numeric()
         })
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/str.rs` around lines 948 - 957, The isalnum
implementation in the isalnum method should use ICU-only classification for
consistency: replace the current check that unions GeneralCategoryGroup::Letter
and GeneralCategoryGroup::DecimalNumber and also calls c.is_numeric() with a
single GeneralCategoryGroup::Number.contains(GeneralCategory::for_char(c))
combined with the existing Letter check (i.e., Letter ∪ Number) so char_all only
uses ICU properties; update the predicate inside str::isalnum (and keep the
!self.data.is_empty() guard) to use GeneralCategoryGroup::Number and remove the
c.is_numeric() call.