Fix stdio encoding by youknowone · Pull Request #6617 · RustPython/RustPython
📝 Walkthrough
Walkthrough
This pull request adds support for the PYTHONIOENCODING environment variable, enabling customization of standard input/output stream encoding and error handling. Changes include environment variable parsing, new settings fields, VM initialization updates for ASCII and UTF-8 encoding registration, and corresponding module aliasing.
Changes
| Cohort / File(s) | Summary |
|---|---|
Spell Check Dictionary .cspell.dict/python-more.txt |
Added PYTHONIOENCODING token to recognized dictionary entries. |
Encoding Module Registration crates/vm/Lib/core_modules/encodings_ascii.py |
New module that references the standard ASCII encoding implementation; used for frozen stdlib mode. |
VM Initialization & Alias Resolution crates/vm/src/vm/mod.rs |
Renamed import_utf8_encodings to import_ascii_utf8_encodings; now registers both ASCII and UTF-8 encodings with conditional module names (standard or frozen variants). Updated resolve_frozen_alias() to map encodings_ascii → encodings.ascii and encodings_utf_8 → encodings.utf_8. Enhanced TextIOWrapper creation to use stdio_encoding and stdio_errors settings; stderr continues using backslashreplace. |
Settings Configuration crates/vm/src/vm/setting.rs |
Added public fields stdio_encoding: Option<String> and stdio_errors: Option<String> to Settings struct; updated Default implementation. |
Environment Variable Parsing src/settings.rs |
Added parsing of PYTHONIOENCODING environment variable in parse_opts(), supporting both encoding:errors and encoding formats; populates corresponding Settings fields when env vars are not ignored. |
Sequence Diagram
sequenceDiagram
participant User as User/Environment
participant Parser as Settings Parser
participant Settings as Settings Struct
participant VM as VM Initialization
participant Encodings as Encoding Registry
participant Stdio as TextIOWrapper
User->>Parser: PYTHONIOENCODING=utf-8:strict
Parser->>Settings: Parse & populate stdio_encoding, stdio_errors
Settings->>VM: Initialize with Settings
VM->>Encodings: import_ascii_utf8_encodings()
Encodings->>Encodings: Register ASCII codec (encodings_ascii)
Encodings->>Encodings: Register UTF-8 codec (encodings_utf_8)
VM->>Stdio: Create stdin/stdout TextIOWrapper
Stdio->>Stdio: Apply stdio_encoding & stdio_errors from Settings
VM->>Stdio: Create stderr TextIOWrapper
Stdio->>Stdio: Use hardcoded backslashreplace error handling
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~22 minutes
Possibly related PRs
- enhance error codec and exit code #6602: Modifies VM stdio/TextIOWrapper initialization and stderr error handling in the same file (
crates/vm/src/vm/mod.rs), suggesting overlapping functionality or coordination needed.
Suggested reviewers
- ShaharNaveh
Poem
🐰 A bunny hops through code so fine,
With encodings now in perfect line!
UTF-8, ASCII, errors too—
PYTHONIOENCODING sees it through! 🌈
Stdio flows where settings guide,
With Python's quirks bunny-approved! 🥕
Pre-merge checks and finishing touches
✅ Passed checks (3 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The pull request title 'Fix stdio encoding' accurately reflects the main changes across multiple files that implement stdio encoding support, including adding PYTHONIOENCODING environment variable processing and propagating encoding/error settings throughout the VM initialization and stdio TextIOWrapper construction. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%. |
✨ Finishing touches
- 📝 Generate docstrings
📜 Recent review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Lib/test/test_tarfile.pyis excluded by!Lib/**
📒 Files selected for processing (5)
.cspell.dict/python-more.txtcrates/vm/Lib/core_modules/encodings_ascii.pycrates/vm/src/vm/mod.rscrates/vm/src/vm/setting.rssrc/settings.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
src/settings.rscrates/vm/src/vm/setting.rscrates/vm/src/vm/mod.rs
**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code
Files:
crates/vm/Lib/core_modules/encodings_ascii.py
🪛 Flake8 (7.3.0)
crates/vm/Lib/core_modules/encodings_ascii.py
[error] FileNotFoundError: [Errno 2] No such file or directory: 'crates/vm/Lib/core_modules/encodings_ascii.py'
(E902)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (windows-latest)
🔇 Additional comments (8)
.cspell.dict/python-more.txt (1)
181-181: LGTM!The addition of PYTHONIOENCODING to the spell-check dictionary is correct and properly positioned alphabetically.
src/settings.rs (1)
301-316: LGTM!The PYTHONIOENCODING parsing correctly handles both
encoding:errorsandencodingformats, with proper empty-string checks to avoid setting empty values. The logic correctly respects theignore_environmentsetting.crates/vm/src/vm/setting.rs (2)
115-118: LGTM!The new
stdio_encodingandstdio_errorsfields are properly typed, documented, and positioned alongside related stdio settings.
202-203: LGTM!The default initialization to
Noneis correct, allowing the VM to use default encoding behavior when PYTHONIOENCODING is not set.crates/vm/src/vm/mod.rs (3)
341-347: LGTM! Correct Python stdio behavior.The stdio initialization correctly propagates
stdio_encodingandstdio_errorsfrom settings. The special handling for stderr (always usingbackslashreplace) matches Python's standard behavior, as stderr should never fail on encoding errors.
1029-1030: LGTM!The alias mappings correctly resolve the core_modules names to their standard library equivalents, maintaining consistency with the conditional module naming in
import_ascii_utf8_encodings.
264-291: Conditional module paths verified and working correctly.The freeze-stdlib conditional logic is properly implemented. Both code paths are correctly configured:
- With
freeze-stdlibenabled: Uses dotted names (encodings.ascii,encodings.utf_8) from the full stdlib- Without
freeze-stdlib: Uses underscored names (encodings_ascii,encodings_utf_8) via symlinks in core_modules that point back to the actual encoding implementationsThe implementation properly registers both ASCII and UTF-8 codecs for all configurations.
crates/vm/Lib/core_modules/encodings_ascii.py (1)
1-1: This is a symbolic link and follows the intended pattern in the codebase.The file
crates/vm/Lib/core_modules/encodings_ascii.pyis a symbolic link pointing to../../../../Lib/encodings/ascii.py, which is a tracked file in the repository. This is a standard and intentional symlink pattern used incore_modulesfor module aliasing (see alsocodecs.py,copyreg.py,encodings_utf_8.py). The symlink target does not resolve in the current sparse checkout, but this is expected—the target files exist in the repository and will be accessible in a full checkout.
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.