Base implementation of `_tokenize` module by ShaharNaveh · Pull Request #6240 · RustPython/RustPython

ShaharNaveh

youknowone added a commit to youknowone/RustPython that referenced this pull request

Mar 6, 2026

Port _tokenize.TokenizerIter from PR RustPython#6240 (ShaharNaveh), adapted to
current codebase. Uses ruff_python_parser for tokenization.
Update Lib/tokenize.py from cpython/Lib/tokenize.py.

youknowone pushed a commit to youknowone/RustPython that referenced this pull request

Mar 6, 2026

Port from PR RustPython#6240 by ShaharNaveh, adapted to current codebase.
Uses ruff_python_parser for tokenization via TokenizerIter.

youknowone pushed a commit to youknowone/RustPython that referenced this pull request

Mar 6, 2026

Port from PR RustPython#6240 by ShaharNaveh, adapted to current codebase.
Uses ruff_python_parser for tokenization via TokenizerIter.

youknowone pushed a commit to youknowone/RustPython that referenced this pull request

Mar 9, 2026

Port from PR RustPython#6240 by ShaharNaveh, adapted to current codebase.
Uses ruff_python_parser for tokenization via TokenizerIter.

youknowone pushed a commit to youknowone/RustPython that referenced this pull request

Mar 9, 2026

Port from PR RustPython#6240 by ShaharNaveh, adapted to current codebase.
Uses ruff_python_parser for tokenization via TokenizerIter.

youknowone pushed a commit to youknowone/RustPython that referenced this pull request

Mar 10, 2026

Port from PR RustPython#6240 by ShaharNaveh, adapted to current codebase.
Uses ruff_python_parser for tokenization via TokenizerIter.

youknowone added a commit that referenced this pull request

Mar 10, 2026

* Base implementation of _tokenize module

Port from PR #6240 by ShaharNaveh, adapted to current codebase.
Uses ruff_python_parser for tokenization via TokenizerIter.

* Update tokenize from v3.14.3

* Rewrite _tokenize with 2-phase model

Replace per-line reparsing with single-pass tokenization:
- Read all lines via readline, parse once, yield tokens
- Fix token type values (COMMENT=65, NL=66, OP=55)
- Fix NEWLINE/NL end positions and implicit newline handling
- Fix DEDENT positions via look-ahead to next non-DEDENT token
- Handle FSTRING_MIDDLE brace unescaping ({{ → {, }} → })
- Emit implicit NL before ENDMARKER when source lacks trailing newline
- Raise IndentationError from lexer errors
- Remove 13 expectedFailure marks for now-passing tests

---------

Co-authored-by: ShaharNaveh <shaharnaveh@users.noreply.github.com>
Co-authored-by: CPython Developers <>

youknowone pushed a commit to youknowone/RustPython that referenced this pull request

Mar 19, 2026

Port from PR RustPython#6240 by ShaharNaveh, adapted to current codebase.
Uses ruff_python_parser for tokenization via TokenizerIter.

youknowone pushed a commit to youknowone/RustPython that referenced this pull request

Mar 22, 2026

Port from PR RustPython#6240 by ShaharNaveh, adapted to current codebase.
Uses ruff_python_parser for tokenization via TokenizerIter.

youknowone added a commit to youknowone/RustPython that referenced this pull request

Mar 22, 2026

* Base implementation of _tokenize module

Port from PR RustPython#6240 by ShaharNaveh, adapted to current codebase.
Uses ruff_python_parser for tokenization via TokenizerIter.

* Update tokenize from v3.14.3

* Rewrite _tokenize with 2-phase model

Replace per-line reparsing with single-pass tokenization:
- Read all lines via readline, parse once, yield tokens
- Fix token type values (COMMENT=65, NL=66, OP=55)
- Fix NEWLINE/NL end positions and implicit newline handling
- Fix DEDENT positions via look-ahead to next non-DEDENT token
- Handle FSTRING_MIDDLE brace unescaping ({{ → {, }} → })
- Emit implicit NL before ENDMARKER when source lacks trailing newline
- Raise IndentationError from lexer errors
- Remove 13 expectedFailure marks for now-passing tests

---------

Co-authored-by: ShaharNaveh <shaharnaveh@users.noreply.github.com>
Co-authored-by: CPython Developers <>