◐ Shell
reader mode source ↗
Skip to content

windows codecs#6337

Merged
youknowone merged 3 commits into
RustPython:mainfrom
youknowone:win-codecs
Dec 7, 2025
Merged

windows codecs#6337
youknowone merged 3 commits into
RustPython:mainfrom
youknowone:win-codecs

Conversation

@youknowone

@youknowone youknowone commented Dec 7, 2025

Copy link
Copy Markdown
Member

Summary by CodeRabbit

  • New Features
    • Windows-specific MBCS and OEM encoding/decoding implementations added, exposing explicit argument forms and return shapes on Windows.
  • Bug Fixes
    • Preserved cross-platform behavior by routing non-Windows calls to the existing codecs to avoid regressions.
  • Chores
    • Introduced structured argument handling for the new platform-specific codec paths to standardize input/output handling.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Dec 7, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

This change adds Windows-specific implementations for MBCS and OEM codec encode/decode functions with dedicated argument structs and typed returns, and retains non-Windows fallbacks that delegate to Python codecs for portability.

Changes

Cohort / File(s) Summary
Windows-specific codec implementations & argument structures
crates/vm/src/stdlib/codecs.rs
Adds four argument structs (MbcsEncodeArgs, MbcsDecodeArgs, OemEncodeArgs, OemDecodeArgs) with FromArgs. Implements Windows-specific mbcs_encode, mbcs_decode, oem_encode, oem_decode functions that accept those structs and return typed PyResult tuples. Non-Windows paths delegate to Python codecs via delegate_pycodecs!.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

  • Review Windows-specific API usage and platform gating in crates/vm/src/stdlib/codecs.rs
  • Verify FromArgs field types and Python-facing signatures match expected calling conventions
  • Confirm return types (PyResult<(Vec<u8>, usize)>, PyResult<(String, usize)>) integrate with VM error/return handling

Poem

🐇 I hopped into code with a curious nose,
Four tiny structs in tidy rows,
MBCS and OEM, I taught them to sing,
Fallbacks in place, so all platforms ring,
I nibble bugs away — rejoice, bytes and prose!

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'windows codecs' is vague and generic, lacking specificity about the actual changes made to codec functionality. Consider a more descriptive title such as 'Add Windows-specific implementations for mbcs and oem codecs' to clarify the main change.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone marked this pull request as ready for review December 7, 2025 14:11

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 4

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 590da47 and c623c56.

📒 Files selected for processing (1)
  • crates/vm/src/stdlib/codecs.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/codecs.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Check the WASM package and demo
🔇 Additional comments (5)
crates/vm/src/stdlib/codecs.rs (5)

228-235: LGTM!

The argument struct follows the established pattern for encode operations and is correctly configured.


316-326: Verify that the unused final parameter aligns with Python's codec API.

The r#final parameter is marked as dead code. While this might be intentional for API compatibility with Python's incremental decoder interface, please confirm this matches the expected signature for mbcs_decode in CPython.


417-424: LGTM!

The argument struct is correctly defined and follows the established pattern.


505-515: Verify that the unused final parameter aligns with Python's codec API.

Same as MbcsDecodeArgs, the r#final parameter is unused. Confirm this matches the expected signature for oem_decode in CPython.


310-314: LGTM!

The non-Windows fallback implementations correctly delegate to Python's _pycodecs module, ensuring cross-platform compatibility. The configuration attributes are properly set.

Also applies to: 411-415, 499-503, 600-604

@ShaharNaveh ShaharNaveh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Gonna approve it, although I don't understand anything in Windows tbh:/

@youknowone

Copy link
Copy Markdown
Member Author

oh, sorry. I will try not to bring you to windows topics too much

@ShaharNaveh

Copy link
Copy Markdown
Contributor

Windows is fine, it's just that my knowledge there is so little (compared to *nix systems), I don't think I can provide useful insights 😅

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 0

♻️ Duplicate comments (4)
crates/vm/src/stdlib/codecs.rs (4)

237-316: Non-strict error modes lack proper implementation.

The function only implements "strict" error handling (lines 308-312). Non-strict modes ("ignore", "replace", "xmlcharrefreplace", "backslashreplace") silently receive Windows default behavior rather than Python codec semantics, and invalid error mode names are not validated.

As per past review comments, proper error mode implementations should:

  • Validate the errors parameter against allowed modes
  • Implement "ignore" to skip unmappable characters
  • Implement "replace" with appropriate substitution
  • Implement "backslashreplace" and "xmlcharrefreplace" with proper escape sequences

439-518: Extract shared implementation to eliminate duplication.

This function duplicates mbcs_encode (lines 237-316), differing only in the code page constant (CP_OEMCP vs CP_ACP) and error messages. Additionally, it has the same incomplete error handling for non-strict modes.

As per past review comments and coding guidelines, extract a shared helper function:

fn windows_encode(
    s: &str,
    char_len: usize,
    errors: &str,
    code_page: u32,
    codec_name: &str,
    vm: &VirtualMachine,
) -> PyResult<(Vec<u8>, usize)> {
    // Shared implementation with code_page parameter
}

Then call from both mbcs_encode with CP_ACP and oem_encode with CP_OEMCP.


538-624: Extract shared implementation and fix unused errors parameter.

This function duplicates mbcs_decode (lines 336-422), differing only in the code page constant. Additionally, it has the same unused _errors parameter (line 545).

As per past review comments and coding guidelines, extract a shared helper:

fn windows_decode(
    data: &[u8],
    code_page: u32,
    errors: &str,
    codec_name: &str,
    vm: &VirtualMachine,
) -> PyResult<(String, usize)> {
    // Shared implementation
}

Call from both functions with appropriate code page constants. Also remove the underscore prefix from the errors parameter and use it to control decode behavior.


336-422: Unused errors parameter prevents proper error handling control.

The _errors parameter (line 343) is extracted but never used to control decode behavior. The function unconditionally tries MB_ERR_INVALID_CHARS first (line 355) and falls back to flags=0 (line 368), regardless of the user's requested error mode. This means users cannot select strict-only behavior or other specific error handling strategies.

Consider removing the underscore prefix and using the errors value to control whether to attempt strict validation or directly use replacement behavior:

-    let _errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict");
+    let errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict");

Then use errors to determine flags:

  • If errors == "strict", use only MB_ERR_INVALID_CHARS and fail on invalid sequences
  • Otherwise, use flags=0 for replacement behavior
🧹 Nitpick comments (3)
crates/vm/src/stdlib/codecs.rs (3)

324-334: Consider implementing the final parameter.

The r#final parameter is currently unused (line 333). While this matches the function signature for codec compatibility, consider whether incremental decoding support should be implemented for completeness.


430-437: LGTM with note on duplication.

The OemEncodeArgs struct is identical to MbcsEncodeArgs. Consider whether these could be consolidated if the encode functions are refactored to share implementation.


526-536: LGTM with duplication note.

The OemDecodeArgs struct duplicates MbcsDecodeArgs. If decode functions are refactored to share implementation, these structs could potentially be consolidated.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 86e1a0b and cdf39bf.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_subprocess.py is excluded by !Lib/**
📒 Files selected for processing (1)
  • crates/vm/src/stdlib/codecs.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/codecs.rs
🧬 Code graph analysis (1)
crates/vm/src/stdlib/codecs.rs (1)
crates/vm/src/stdlib/nt.rs (1)
  • from_utf16 (327-329)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (1)
crates/vm/src/stdlib/codecs.rs (1)

228-235: LGTM!

The MbcsEncodeArgs struct follows the established pattern for codec argument structures.

Hide details View details @youknowone youknowone merged commit cab41c8 into RustPython:main Dec 7, 2025
13 checks passed
@youknowone youknowone deleted the win-codecs branch December 7, 2025 16:51
@coderabbitai coderabbitai Bot mentioned this pull request Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants