windows codecs#6337
Conversation
WalkthroughThis change adds Windows-specific implementations for MBCS and OEM codec encode/decode functions with dedicated argument structs and typed returns, and retains non-Windows fallbacks that delegate to Python codecs for portability. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Sorry, something went wrong.
a204467 to
c623c56
Compare
December 7, 2025 14:10
There was a problem hiding this comment.
Actionable comments posted: 4
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
crates/vm/src/stdlib/codecs.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/vm/src/stdlib/codecs.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Check Rust code with clippy
- GitHub Check: Check the WASM package and demo
🔇 Additional comments (5)
crates/vm/src/stdlib/codecs.rs (5)
228-235: LGTM!The argument struct follows the established pattern for encode operations and is correctly configured.
316-326: Verify that the unusedfinalparameter aligns with Python's codec API.The
r#finalparameter is marked as dead code. While this might be intentional for API compatibility with Python's incremental decoder interface, please confirm this matches the expected signature formbcs_decodein CPython.
417-424: LGTM!The argument struct is correctly defined and follows the established pattern.
505-515: Verify that the unusedfinalparameter aligns with Python's codec API.Same as
MbcsDecodeArgs, ther#finalparameter is unused. Confirm this matches the expected signature foroem_decodein CPython.
310-314: LGTM!The non-Windows fallback implementations correctly delegate to Python's
_pycodecsmodule, ensuring cross-platform compatibility. The configuration attributes are properly set.Also applies to: 411-415, 499-503, 600-604
Sorry, something went wrong.
c623c56 to
86e1a0b
Compare
December 7, 2025 14:43
ShaharNaveh
left a comment
There was a problem hiding this comment.
Gonna approve it, although I don't understand anything in Windows tbh:/
Sorry, something went wrong.
|
oh, sorry. I will try not to bring you to windows topics too much |
Sorry, something went wrong.
|
Windows is fine, it's just that my knowledge there is so little (compared to *nix systems), I don't think I can provide useful insights 😅 |
Sorry, something went wrong.
05e888e to
cdf39bf
Compare
December 7, 2025 16:23
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (4)
crates/vm/src/stdlib/codecs.rs (4)
237-316: Non-strict error modes lack proper implementation.The function only implements "strict" error handling (lines 308-312). Non-strict modes ("ignore", "replace", "xmlcharrefreplace", "backslashreplace") silently receive Windows default behavior rather than Python codec semantics, and invalid error mode names are not validated.
As per past review comments, proper error mode implementations should:
- Validate the errors parameter against allowed modes
- Implement "ignore" to skip unmappable characters
- Implement "replace" with appropriate substitution
- Implement "backslashreplace" and "xmlcharrefreplace" with proper escape sequences
439-518: Extract shared implementation to eliminate duplication.This function duplicates
mbcs_encode(lines 237-316), differing only in the code page constant (CP_OEMCPvsCP_ACP) and error messages. Additionally, it has the same incomplete error handling for non-strict modes.As per past review comments and coding guidelines, extract a shared helper function:
fn windows_encode( s: &str, char_len: usize, errors: &str, code_page: u32, codec_name: &str, vm: &VirtualMachine, ) -> PyResult<(Vec<u8>, usize)> { // Shared implementation with code_page parameter }Then call from both
mbcs_encodewithCP_ACPandoem_encodewithCP_OEMCP.
538-624: Extract shared implementation and fix unused errors parameter.This function duplicates
mbcs_decode(lines 336-422), differing only in the code page constant. Additionally, it has the same unused_errorsparameter (line 545).As per past review comments and coding guidelines, extract a shared helper:
fn windows_decode( data: &[u8], code_page: u32, errors: &str, codec_name: &str, vm: &VirtualMachine, ) -> PyResult<(String, usize)> { // Shared implementation }Call from both functions with appropriate code page constants. Also remove the underscore prefix from the
errorsparameter and use it to control decode behavior.
336-422: Unusederrorsparameter prevents proper error handling control.The
_errorsparameter (line 343) is extracted but never used to control decode behavior. The function unconditionally triesMB_ERR_INVALID_CHARSfirst (line 355) and falls back to flags=0 (line 368), regardless of the user's requested error mode. This means users cannot select strict-only behavior or other specific error handling strategies.Consider removing the underscore prefix and using the
errorsvalue to control whether to attempt strict validation or directly use replacement behavior:- let _errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); + let errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict");Then use
errorsto determine flags:
- If
errors == "strict", use onlyMB_ERR_INVALID_CHARSand fail on invalid sequences- Otherwise, use flags=0 for replacement behavior
🧹 Nitpick comments (3)
crates/vm/src/stdlib/codecs.rs (3)
324-334: Consider implementing thefinalparameter.The
r#finalparameter is currently unused (line 333). While this matches the function signature for codec compatibility, consider whether incremental decoding support should be implemented for completeness.
430-437: LGTM with note on duplication.The
OemEncodeArgsstruct is identical toMbcsEncodeArgs. Consider whether these could be consolidated if the encode functions are refactored to share implementation.
526-536: LGTM with duplication note.The
OemDecodeArgsstruct duplicatesMbcsDecodeArgs. If decode functions are refactored to share implementation, these structs could potentially be consolidated.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Lib/test/test_subprocess.pyis excluded by!Lib/**
📒 Files selected for processing (1)
crates/vm/src/stdlib/codecs.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/vm/src/stdlib/codecs.rs
🧬 Code graph analysis (1)
crates/vm/src/stdlib/codecs.rs (1)
crates/vm/src/stdlib/nt.rs (1)
from_utf16(327-329)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Check the WASM package and demo
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Check Rust code with clippy
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Ensure compilation on various targets
🔇 Additional comments (1)
crates/vm/src/stdlib/codecs.rs (1)
228-235: LGTM!The
MbcsEncodeArgsstruct follows the established pattern for codec argument structures.
Sorry, something went wrong.
cab41c8
into
RustPython:main
Dec 7, 2025
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.