impl more nt#6984
Conversation
📝 WalkthroughWalkthroughWindows-specific codecs and extensive Windows stdlib changes were added (code-page encode/decode, codec delegation), large rewrites to nt.rs (symlink, path normalization, reparse-point and attribute APIs), a tiny exceptions newline-trim tweak, and tighter env-var validation. Changes
Sequence Diagram(s)sequenceDiagram
participant Py as Python caller
participant VM as VirtualMachine/_codecs
participant Sub as _codecs_windows (submodule)
participant Win as Windows API
Py->>VM: call codec function (e.g., code_page_encode)
alt on non-Windows
VM->>VM: delegate_pycodecs! -> route to registered PyCodecs
VM-->>Py: return result
else on Windows
VM->>Sub: call Windows-specific wrapper
Sub->>Win: WideCharToMultiByte / MultiByteToWideChar
Win-->>Sub: bytes/string + status
Sub-->>VM: result
VM-->>Py: return result
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Sorry, something went wrong.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/stdlib/nt.rs (1)
2262-2294: ⚠️ Potential issue | 🟠 MajorChange ValueError to OSError for non-symlink reparse tags.
readlinkshould raiseOSError(withEINVALerrno), notValueError, to match POSIX semantics when the reparse point is not a symlink or junction.The proposed fix references
ERROR_NOT_A_REPARSE_POINT, which does not exist inwindows_sys. Instead, uselibc::EINVAL:🛠️ Corrected error mapping
- } else { - return Err(vm.new_value_error("not a symbolic link".to_owned())); - }; + } else { + let err = io::Error::from_raw_os_error(libc::EINVAL); + return Err(OSErrorBuilder::with_filename(&err, path, vm)); + };
🤖 Fix all issues with AI agents
In `@crates/vm/src/stdlib/codecs.rs`:
- Around line 770-865: The decode currently falls back to non-strict decoding
whenever the first MultiByteToWideChar call returns 0; change code in the
code_page_decode path to check args.errors (the _errors variable) when
strict_flags uses MB_ERR_INVALID_CHARS: if size == 0 and _errors == "strict"
then return vm.new_unicode_decode_error(...) (use the same error formatting as
other branches) instead of falling back to call MultiByteToWideChar with flags
0; otherwise (when _errors != "strict") keep the existing fallback logic.
Reference symbols: args.errors / _errors, strict_flags, MB_ERR_INVALID_CHARS,
MultiByteToWideChar, vm.new_unicode_decode_error, and the code_page_decode
return paths.
In `@crates/vm/src/stdlib/nt.rs`:
- Around line 573-666: The disk_device check in _test_file_type_by_name
incorrectly omits network filesystems so regular files on UNC/share paths aren't
treated as disk devices; update the matches!(...) in _test_file_type_by_name to
include FILE_DEVICE_NETWORK_FILE_SYSTEM (import it from
windows_sys::Win32::System::Ioctl) alongside FILE_DEVICE_DISK,
FILE_DEVICE_VIRTUAL_DISK, and FILE_DEVICE_CD_ROM so _test_info(...) correctly
treats network-mounted regular files as disk devices and allows the rest of the
logic (including the handle-based fallback) to run.
- Around line 1287-1333: In _getfinalpathname, the CreateFileW call opens the
file with access 0 and share mode 0 which causes sharing violations; change the
access parameter to FILE_READ_ATTRIBUTES and the sharemode parameter to
FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE (matching other stdlib
patterns) when calling CreateFileW so the handle is opened permissively for
metadata reads; update the CreateFileW invocation in the _getfinalpathname
function accordingly and keep the rest of the logic
(GetFinalPathNameByHandleW/CloseHandle) unchanged.
🧹 Nitpick comments (1)
crates/vm/src/stdlib/nt.rs (1)
1585-1750: Add tests for the new Windows normpath implementation.This is a sizeable port; consider tests for UNC/device paths, drive-relative forms, dot/dotdot handling, trailing separators, and bytes input to guard parity.
Sorry, something went wrong.
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin nt |
Sorry, something went wrong.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/vm/src/stdlib/codecs.rs (2)
465-521: ⚠️ Potential issue | 🟠 MajorHonor
errors="strict"inmbcs_decodebefore falling back.The
_errorsvariable is parsed but never used. Currently, when the strictMultiByteToWideCharpass withMB_ERR_INVALID_CHARSfails, the code unconditionally retries without the flag, allowing invalid bytes to pass through even in strict mode. In strict mode, this should raise an error instead.🛠️ Proposed fix
- let _errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); + let errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); let data = args.data.borrow_buf(); let len = data.len(); @@ -488,6 +488,11 @@ mod windows_codecs { if size == 0 { + if errors == "strict" { + return Err(vm.new_unicode_decode_error( + "'mbcs' codec can't decode bytes: invalid character".to_owned(), + )); + } // Try without MB_ERR_INVALID_CHARS for non-strict mode (replacement behavior)
667-723: ⚠️ Potential issue | 🟠 MajorHonor
errors="strict"inoem_decodeand other decode functions.At line 667,
_errorsis parsed but never used. When the strictMultiByteToWideCharpass fails (size == 0), the code unconditionally retries withoutMB_ERR_INVALID_CHARS, bypassing strict error checking. This violates codec semantics:errors="strict"should raise an error on invalid bytes, not silently fall back to lenient decoding.The same issue exists in
mbcs_decode(line 465) andcode_page_decode(line 895). By contrast, the corresponding encode functions (oem_encode,code_page_encode) correctly use theerrorsparameter to control behavior in strict mode.🛠️ Proposed fix for oem_decode
- let _errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); + let errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); let data = args.data.borrow_buf(); let len = data.len(); if data.is_empty() { return Ok((String::new(), 0)); } // Get the required buffer size for UTF-16 let size = unsafe { MultiByteToWideChar( CP_OEMCP, MB_ERR_INVALID_CHARS, data.as_ptr().cast(), len as i32, std::ptr::null_mut(), 0, ) }; if size == 0 { + if errors == "strict" { + return Err(vm.new_unicode_decode_error( + "'oem' codec can't decode bytes: invalid character".to_owned(), + )); + } // Try without MB_ERR_INVALID_CHARS for non-strict mode (replacement behavior)Apply the same logic to
mbcs_decodeandcode_page_decode.
Sorry, something went wrong.
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [ ] lib: cpython/Lib/io.py dependencies:
dependent tests: (87 tests)
[ ] test: cpython/Lib/test/test_cmd_line_script.py (TODO: 16) dependencies: dependent tests: (no tests depend on cmd_line_script) [x] lib: cpython/Lib/codecs.py dependencies:
dependent tests: (105 tests)
[ ] lib: cpython/Lib/compileall.py dependencies:
dependent tests: (1 tests)
[ ] lib: cpython/Lib/concurrent dependencies:
dependent tests: (11 tests)
[x] test: cpython/Lib/test/test_exceptions.py (TODO: 27) dependencies: dependent tests: (no tests depend on exception) [x] lib: cpython/Lib/os.py dependencies:
dependent tests: (160 tests)
[x] test: cpython/Lib/test/test_posix.py (TODO: 4) dependencies: dependent tests: (no tests depend on posix) [x] lib: cpython/Lib/runpy.py dependencies:
dependent tests: (3 tests)
[ ] lib: cpython/Lib/test/support dependencies:
dependent tests: (no tests depend on support) [ ] lib: cpython/Lib/subprocess.py dependencies:
dependent tests: (52 tests)
[x] lib: cpython/Lib/weakref.py dependencies:
dependent tests: (148 tests)
[ ] lib: cpython/Lib/zipfile dependencies:
dependent tests: (8 tests)
Legend:
|
Sorry, something went wrong.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/vm/src/stdlib/codecs.rs (2)
447-502: ⚠️ Potential issue | 🟠 Major
errors="strict"isn't enforced on decode failures inmbcs_decode.The
_errorsvariable (line 447) is extracted but never used. WhenMultiByteToWideCharwithMB_ERR_INVALID_CHARSreturns 0, the code unconditionally falls back to replacement decoding (lines 468-502). This means invalid bytes won't raise aUnicodeDecodeErroreven whenerrors="strict".🛠️ Proposed fix
- let _errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); + let errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); let data = args.data.borrow_buf(); let len = data.len(); @@ if size == 0 { + if errors == "strict" { + return Err(vm.new_unicode_decode_error( + "'mbcs' codec can't decode bytes: invalid character".to_string(), + )); + } // Try without MB_ERR_INVALID_CHARS for non-strict mode (replacement behavior) let size = unsafe {
637-692: ⚠️ Potential issue | 🟠 Major
errors="strict"isn't enforced on decode failures inoem_decode.Same issue as
mbcs_decode: the_errorsvariable (line 637) is unused, and the code unconditionally falls back to replacement decoding when the strict check fails. This violates Python's codec error handling semantics.🛠️ Proposed fix
- let _errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); + let errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); let data = args.data.borrow_buf(); let len = data.len(); @@ if size == 0 { + if errors == "strict" { + return Err(vm.new_unicode_decode_error( + "'oem' codec can't decode bytes: invalid character".to_string(), + )); + } // Try without MB_ERR_INVALID_CHARS for non-strict mode (replacement behavior) let size = unsafe {
🧹 Nitpick comments (1)
crates/vm/src/stdlib/codecs.rs (1)
338-348: Redundant#[cfg(windows)]attributes inside already-gated module.The structs and functions inside
_codecs_windows(lines 338, 347, 428, 440, 528, 537, 618, 630, 718, 729, 828, 842) have#[cfg(windows)]attributes, but the enclosing module at line 332 is already gated by#[cfg(windows)]. These inner attributes are redundant and can be removed for cleaner code.
Sorry, something went wrong.
c045593
into
RustPython:main
Feb 4, 2026
Summary by CodeRabbit
New Features
Bug Fixes