◐ Shell
clean mode source ↗

no_std for common, pylib, codegen by youknowone · Pull Request #7056 · RustPython/RustPython

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the levenshtein_distance function around line 536
cat -n crates/common/src/str.rs | sed -n '500,560p'

Repository: RustPython/RustPython

Length of output: 2483


🏁 Script executed:

#!/bin/bash
# Search for test cases related to levenshtein_distance
rg -n 'levenshtein_distance|#\[test\]' crates/common/src/str.rs -A 5 -B 1

Repository: RustPython/RustPython

Length of output: 474


🏁 Script executed:

#!/bin/bash
# Look for any other occurrences of the pattern to understand the refactoring
git log --oneline -n 20 -- crates/common/src/str.rs

Repository: RustPython/RustPython

Length of output: 93


🏁 Script executed:

#!/bin/bash
# Search for any test cases that verify levenshtein distance behavior
find . -name "*.rs" -type f -exec grep -l "levenshtein" {} \;

Repository: RustPython/RustPython

Length of output: 121


🏁 Script executed:

#!/bin/bash
# Look for test cases in the file itself
grep -n "test" crates/common/src/str.rs | head -20

Repository: RustPython/RustPython

Length of output: 143


🏁 Script executed:

#!/bin/bash
# Try to find CPython's implementation online or in comments
rg -n "CPython|suggestions\.c|UTF8_Edit" crates/ -A 3 -B 1

Repository: RustPython/RustPython

Length of output: 50380


🏁 Script executed:

#!/bin/bash
# Check if there's a git history we can inspect for this function
cd crates/common && git log --oneline -n 30 -- src/str.rs 2>/dev/null || echo "Git history not available"

Repository: RustPython/RustPython

Length of output: 93


🏁 Script executed:

#!/bin/bash
# Check the suggestion.rs file which might have the levenshtein implementation
cat -n crates/vm/src/suggestion.rs | head -100

Repository: RustPython/RustPython

Length of output: 4334


🏁 Script executed:

#!/bin/bash
# Search for any tests that use levenshtein_distance or suggestion functionality
find . -name "*.rs" -type f | xargs grep -l "suggestion\|levenshtein" | head -10

Repository: RustPython/RustPython

Length of output: 324


🏁 Script executed:

#!/bin/bash
# Look at the suggestion.rs file to see if it uses the levenshtein_distance function
rg -n "levenshtein_distance" crates/vm/src/suggestion.rs -B 3 -A 3

Repository: RustPython/RustPython

Length of output: 542


🏁 Script executed:

#!/bin/bash
# Search for any tests that call levenshtein_distance or validate its output
find . -name "*.rs" -type f | xargs grep -l "test.*levenshtein\|levenshtein.*test" 2>/dev/null

Repository: RustPython/RustPython

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Look for any Python tests that might indirectly test levenshtein via suggestions
find . -name "test*.py" -o -name "*test.py" | xargs grep -l "AttributeError\|NameError\|suggestion" 2>/dev/null | head -5

Repository: RustPython/RustPython

Length of output: 214


🏁 Script executed:

#!/bin/bash
# Search for CPython reference or comments about the levenshtein implementation
rg -n "CPython|edit.*cost|levenshtein" crates/common/src/str.rs -B 2 -A 2

Repository: RustPython/RustPython

Length of output: 329


🌐 Web query:

CPython _Py_UTF8_Edit_Cost implementation Levenshtein

💡 Result:

In CPython, _Py_UTF8_Edit_Cost() is implemented in Python/suggestions.c and is a bounded, weighted Levenshtein-distance routine over the strings’ UTF-8 byte sequences (it calls PyUnicode_AsUTF8AndSize() for both inputs). [1]

Key details of the implementation:

  • Operation costs

    • Insert/delete (“move”): MOVE_COST = 2 [1]
    • Substitution: 0 if equal; CASE_COST = 1 for ASCII case-only differences; otherwise MOVE_COST = 2 [1]
    • Case-only detection is optimized via LEAST_FIVE_BITS(c) == (c & 31) plus ASCII folding to confirm a case flip. [1]
  • Algorithm / optimizations

    • Trims common prefix and suffix before DP. [1]
    • Uses a single-row DP (in-place update) rather than a full matrix. [1]
    • Has a max_cost threshold: if the minimum value in a DP row exceeds max_cost, it bails out early and returns max_cost + 1. [1]
    • Hard limits: if either (trimmed) byte length exceeds MAX_STRING_SIZE (40), returns max_cost + 1 (treated as “too expensive”). [1]
  • max_cost == -1 behavior

    • Interpreted as “no practical bound”: it sets max_cost = MOVE_COST * max(len(a), len(b)) and then computes the distance. [1]

The function is declared as an internal API in Include/internal/pycore_pyerrors.h. [2]

Sources: Python/suggestions.c (Android git mirror of CPython) [1]; pycore_pyerrors.h declaration [2].


Line 536: Off-by-one bug in Levenshtein DP initialization.

The variable result represents the left-edge cost (deletion cost) for the current DP row. For row b_index+1, this should be (b_index + 1) * MOVE_COST, not b_index * MOVE_COST. The current code computes all edit distances with the wrong row-edge values, producing incorrect results. This matches CPython's _Py_UTF8_Edit_Cost implementation, which correctly uses (b_index + 1) * MOVE_COST.