Use Unicode properties for alnum, alpha, etc. by joshuamegnauth54 · Pull Request #7626 · RustPython/RustPython
959-979: Prefer matches! over array .contains(&..) for small enum sets.
[NumericType::Decimal, NumericType::Digit, NumericType::Numeric].contains(&NumericType::for_char(c)) constructs a stack array and does a linear search on every character. matches! is more idiomatic, reads better, and the compiler lowers it to a single jump table / bitwise test. Same applies to isdigit on Line 977.
Also: the lingering comment on Line 974 ("python's isdigit also checks if exponents are digits, these are the unicode codepoints for exponents") referred to the old hardcoded superscript list; with NumericType::Digit handling those uniformly, the tail of that sentence is stale and could be trimmed.
♻️ Proposed cleanup
#[pymethod]
fn isnumeric(&self) -> bool {
!self.data.is_empty()
&& self.char_all(|c| {
- [
- NumericType::Decimal,
- NumericType::Digit,
- NumericType::Numeric,
- ]
- .contains(&NumericType::for_char(c))
+ matches!(
+ NumericType::for_char(c),
+ NumericType::Decimal | NumericType::Digit | NumericType::Numeric
+ )
})
}
#[pymethod]
fn isdigit(&self) -> bool {
- // python's isdigit also checks if exponents are digits, these are the unicode codepoints for exponents
+ // Matches CPython: Numeric_Type=Digit or Numeric_Type=Decimal (covers superscripts, etc.).
!self.data.is_empty()
&& self.char_all(|c| {
- [NumericType::Digit, NumericType::Decimal].contains(&NumericType::for_char(c))
+ matches!(
+ NumericType::for_char(c),
+ NumericType::Digit | NumericType::Decimal
+ )
})
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/builtins/str.rs` around lines 959 - 979, Replace the
small-array-based enum membership checks in the isnumeric and isdigit methods
with matches! pattern matching to avoid allocating/linear-search semantics: in
isnumeric (fn isnumeric) change the closure that currently uses
[NumericType::Decimal, NumericType::Digit,
NumericType::Numeric].contains(&NumericType::for_char(c)) to a matches! on
NumericType::for_char(c) matching Decimal | Digit | Numeric; similarly in
isdigit (fn isdigit) replace the [NumericType::Digit,
NumericType::Decimal].contains(&NumericType::for_char(c)) check with
matches!(NumericType::for_char(c), Digit | Decimal). Also trim the stale portion
of the comment inside isdigit that mentions explicit superscript codepoints so
the comment only documents that isdigit checks digit/exponent categories if
kept.
948-957: Consider consolidating isalnum to use GeneralCategoryGroup::Number for consistency.
GeneralCategoryGroup::Letter ∪ DecimalNumber + c.is_numeric() correctly covers L*, Nd, Nl, and No (matching CPython's requirements). However, this method mixes ICU and Rust stdlib classification, while sibling methods (isalpha, isdecimal, isnumeric) use ICU exclusively. GeneralCategoryGroup::Number (available in icu_properties v2) expands to exactly Nd | Nl | No, making the refactor below functionally equivalent while consolidating to a single source of truth:
♻️ Optional simplification
fn isalnum(&self) -> bool {
!self.data.is_empty()
&& self.char_all(|c| {
GeneralCategoryGroup::Letter
- .union(GeneralCategoryGroup::DecimalNumber)
+ .union(GeneralCategoryGroup::Number)
.contains(GeneralCategory::for_char(c))
- || c.is_numeric()
})
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/builtins/str.rs` around lines 948 - 957, The isalnum
implementation in the isalnum method should use ICU-only classification for
consistency: replace the current check that unions GeneralCategoryGroup::Letter
and GeneralCategoryGroup::DecimalNumber and also calls c.is_numeric() with a
single GeneralCategoryGroup::Number.contains(GeneralCategory::for_char(c))
combined with the existing Letter check (i.e., Letter ∪ Number) so char_all only
uses ICU properties; update the predicate inside str::isalnum (and keep the
!self.data.is_empty() guard) to use GeneralCategoryGroup::Number and remove the
c.is_numeric() call.