◐ Shell
clean mode source ↗

Implement _tokenize and update tokenize from v3.14.3 by youknowone · Pull Request #7392 · RustPython/RustPython

ShaharNaveh

ShaharNaveh

@youknowone youknowone changed the title Implement _tokenize Implement _tokenize and update tokenize from v3.14.3

Mar 10, 2026

coderabbitai[bot]

Port from PR RustPython#6240 by ShaharNaveh, adapted to current codebase.
Uses ruff_python_parser for tokenization via TokenizerIter.
Replace per-line reparsing with single-pass tokenization:
- Read all lines via readline, parse once, yield tokens
- Fix token type values (COMMENT=65, NL=66, OP=55)
- Fix NEWLINE/NL end positions and implicit newline handling
- Fix DEDENT positions via look-ahead to next non-DEDENT token
- Handle FSTRING_MIDDLE brace unescaping ({{ → {, }} → })
- Emit implicit NL before ENDMARKER when source lacks trailing newline
- Raise IndentationError from lexer errors
- Remove 13 expectedFailure marks for now-passing tests

youknowone added a commit to youknowone/RustPython that referenced this pull request

Mar 22, 2026
* Base implementation of _tokenize module

Port from PR RustPython#6240 by ShaharNaveh, adapted to current codebase.
Uses ruff_python_parser for tokenization via TokenizerIter.

* Update tokenize from v3.14.3

* Rewrite _tokenize with 2-phase model

Replace per-line reparsing with single-pass tokenization:
- Read all lines via readline, parse once, yield tokens
- Fix token type values (COMMENT=65, NL=66, OP=55)
- Fix NEWLINE/NL end positions and implicit newline handling
- Fix DEDENT positions via look-ahead to next non-DEDENT token
- Handle FSTRING_MIDDLE brace unescaping ({{ → {, }} → })
- Emit implicit NL before ENDMARKER when source lacks trailing newline
- Raise IndentationError from lexer errors
- Remove 13 expectedFailure marks for now-passing tests

---------

Co-authored-by: ShaharNaveh <shaharnaveh@users.noreply.github.com>
Co-authored-by: CPython Developers <>