Add sys._stdlib_dir by youknowone · Pull Request #6798 · RustPython/RustPython
Summary by CodeRabbit
Release Notes
-
Bug Fixes
- Improved standard library path resolution, particularly on Windows with enhanced lib directory detection for both text-file and symlink-based paths
-
Refactor
- Centralized standard library directory configuration through interpreter state management
✏️ Tip: You can customize this high-level summary in your review settings.
📝 Walkthrough
Walkthrough
Introduces a new stdlib_dir field to the path configuration system, computed via platform-specific logic in getpath.rs, exposed through the sys module as _stdlib_dir, and wired during interpreter initialization to store the standard library location derived from either frozen or dynamic paths.
Changes
| Cohort / File(s) | Summary |
|---|---|
Path configuration structure crates/vm/src/vm/setting.rs |
Added optional stdlib_dir field to Paths struct to store the standard library directory path |
stdlib directory computation crates/vm/src/getpath.rs |
Introduced private calculate_stdlib_dir() helper with platform-specific logic; added Step 9 to init_path_config() to assign computed stdlib_dir with existence validation |
Python API exposure crates/vm/src/stdlib/sys.rs |
Added _stdlib_dir attribute to sys module that returns vm.state.config.paths.stdlib_dir as a PyObjectRef |
Interpreter initialization src/interpreter.rs |
Replaced vm.sys_module hack with PyRc-based state access; centralized stdlib_dir assignment for both frozen and dynamic stdlib path flows |
Windows build configuration crates/pylib/build.rs |
Enhanced Lib path resolution to handle both symlinks and text files (git with/without symlink support); added canonicalization and \?\ prefix stripping |
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~22 minutes
Possibly related PRs
- Move
pylib->crates/pylib#6225: Modifies build.rs Lib path resolution logic for platform/path handling - nt junction #6407: Strips Windows extended-length prefix (
\?\) from canonicalized Lib path in build.rs - Introduce PyConfig to implement Paths correctly #6461: Extends PyConfig/Paths plumbing by adding and wiring the new stdlib_dir field through the path infrastructure
Suggested reviewers
- coolreader18
- arihant2math
Poem
🐰 A rabbit hops through stdlib ways,
Building paths through winding maze,
From frozen lib to dynamic ground,
stdlib_dir must now be found! 📚✨
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The pull request title 'Add sys._stdlib_dir' directly reflects the main change: adding a new Python-accessible attribute _stdlib_dir to the sys module, which is the primary public-facing feature of this changeset. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing touches
- 📝 Generate docstrings
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Code has been automatically formatted
The code in this PR has been formatted using:
cargo fmt --all
Please pull the latest changes before pushing again:
git pull origin stdlib_dir
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/pylib/build.rs`:
- Around line 14-31: The build script currently calls
canonicalize(...).to_str().unwrap(), which can panic on non‑UTF‑8 Windows paths;
change the emission to use a lossy conversion from the canonicalized Path (e.g.,
Path::to_string_lossy on the value returned by std::fs::canonicalize(&lib_path))
and print that into the cargo:rustc-env=win_lib_path value, and if
std::fs::canonicalize(&lib_path) returns an Err emit a visible build warning via
println!("cargo:warning=...") instead of silently skipping so missing
win_lib_path is obvious; update the code paths referencing canonicalized_path
and to_str().unwrap() accordingly.
Comment on lines +14 to +31
| if cfg!(windows) { | ||
| // On Windows, the Lib entry can be either: | ||
| // 1. A text file containing the relative path (git without symlink support) | ||
| // 2. A proper symlink (git with symlink support) | ||
| // We handle both cases to resolve to the actual Lib directory. | ||
| let lib_path = if let Ok(real_path) = std::fs::read_to_string("Lib") { | ||
| // Case 1: Text file containing relative path | ||
| std::path::PathBuf::from(real_path.trim()) | ||
| } else { | ||
| // Case 2: Symlink or directory - canonicalize directly | ||
| std::path::PathBuf::from("Lib") | ||
| }; | ||
|
|
||
| if let Ok(canonicalized_path) = std::fs::canonicalize(&lib_path) { | ||
| // Strip the extended path prefix (\\?\) that canonicalize adds on Windows | ||
| let path_str = canonicalized_path.to_str().unwrap(); | ||
| let path_str = path_str.strip_prefix(r"\\?\").unwrap_or(path_str); | ||
| println!("cargo:rustc-env=win_lib_path={path_str}"); |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid panic on non‑UTF‑8 Windows paths.
to_str().unwrap() can panic on valid Windows paths containing non‑UTF‑8 sequences, breaking the build. Prefer a lossy conversion and consider warning when canonicalization fails so missing win_lib_path isn’t silent.
🔧 Proposed safer emission
- if let Ok(canonicalized_path) = std::fs::canonicalize(&lib_path) { - // Strip the extended path prefix (\\?\) that canonicalize adds on Windows - let path_str = canonicalized_path.to_str().unwrap(); - let path_str = path_str.strip_prefix(r"\\?\").unwrap_or(path_str); - println!("cargo:rustc-env=win_lib_path={path_str}"); - } + if let Ok(canonicalized_path) = std::fs::canonicalize(&lib_path) { + // Strip the extended path prefix (\\?\) that canonicalize adds on Windows + let path_str = canonicalized_path.to_string_lossy(); + let path_str = path_str.strip_prefix(r"\\?\").unwrap_or(&path_str); + println!("cargo:rustc-env=win_lib_path={path_str}"); + } else { + println!("cargo:warning=failed to canonicalize Lib path: {:?}", lib_path); + }
🤖 Prompt for AI Agents
In `@crates/pylib/build.rs` around lines 14 - 31, The build script currently calls
canonicalize(...).to_str().unwrap(), which can panic on non‑UTF‑8 Windows paths;
change the emission to use a lossy conversion from the canonicalized Path (e.g.,
Path::to_string_lossy on the value returned by std::fs::canonicalize(&lib_path))
and print that into the cargo:rustc-env=win_lib_path value, and if
std::fs::canonicalize(&lib_path) returns an Err emit a visible build warning via
println!("cargo:warning=...") instead of silently skipping so missing
win_lib_path is obvious; update the code paths referencing canonicalized_path
and to_str().unwrap() accordingly.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/codegen/src/compile.rs`:
- Around line 6983-6986: The DictMerge path currently uses the first mapping as
the accumulator which can mutate a caller-provided **kwargs dict; change
compile.rs so that when merging multiple mappings (the loop around emit!(self,
Instruction::DictMerge { index: 1 })), you first allocate a fresh empty
accumulator and merge each mapping into that fresh map instead of using the
first input as the target; specifically, before the for-loop that emits
Instruction::DictMerge, emit an instruction to create a new empty dict (so the
accumulator is new) and then merge all incoming `**` groups into that fresh
accumulator (update the code that emits Instruction::DictMerge and the
surrounding logic in the same block to use the new accumulator).
🧹 Nitpick comments (1)
crates/vm/src/frame.rs (1)
962-962: Inconsistent downcast pattern withDictUpdate.
DictUpdate(line 934) uses a safe downcast with.expect():let dict = dict.downcast_ref::<PyDict>().expect("exact dict expected");But
DictMergeuses an unsafe downcast without aSAFETYcomment explaining the invariant. Consider using the same safe pattern asDictUpdatefor consistency and defensive coding, or at minimum add aSAFETYcomment like the other unsafe downcasts in this file (e.g.,ListAppend,ListExtend).♻️ Suggested fix for consistency
- let dict: &Py<PyDict> = unsafe { dict_ref.downcast_unchecked_ref() }; + let dict: &Py<PyDict> = unsafe { + // SAFETY: compiler guarantees correct type + dict_ref.downcast_unchecked_ref() + };Or use the safe pattern:
- let dict: &Py<PyDict> = unsafe { dict_ref.downcast_unchecked_ref() }; + let dict = dict_ref.downcast_ref::<PyDict>().expect("exact dict expected");
Comment on lines +6983 to +6986
| // Merge all dicts: first dict is accumulator, merge rest into it | ||
| for _ in 1..size { | ||
| emit!(self, Instruction::DictMerge { index: 1 }); | ||
| } |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid mutating user-provided **kwargs when merging.
DictMerge uses the first mapping as the accumulator. If the first group is **kwargs, this mutates the caller’s dict when multiple ** groups exist (observable vs CPython, which builds a fresh dict). Please ensure the accumulator is always a new map before merging any ** inputs.
🔧 Possible fix (merge into a fresh accumulator)
- let mut size = 0; - let groupby = keywords.iter().chunk_by(|e| e.arg.is_none()); - for (is_unpacking, sub_keywords) in &groupby { - if is_unpacking { - for keyword in sub_keywords { - self.compile_expression(&keyword.value)?; - size += 1; - } - } else { - let mut sub_size = 0; - for keyword in sub_keywords { - if let Some(name) = &keyword.arg { - self.emit_load_const(ConstantData::Str { - value: name.as_str().into(), - }); - self.compile_expression(&keyword.value)?; - sub_size += 1; - } - } - emit!(self, Instruction::BuildMap { size: sub_size }); - size += 1; - } - } - if size > 1 { - // Merge all dicts: first dict is accumulator, merge rest into it - for _ in 1..size { - emit!(self, Instruction::DictMerge { index: 1 }); - } - } + let mut have_accumulator = false; + let groupby = keywords.iter().chunk_by(|e| e.arg.is_none()); + for (is_unpacking, sub_keywords) in &groupby { + if !have_accumulator { + emit!(self, Instruction::BuildMap { size: 0 }); // fresh accumulator + have_accumulator = true; + } + if is_unpacking { + for keyword in sub_keywords { + self.compile_expression(&keyword.value)?; + emit!(self, Instruction::DictMerge { index: 1 }); + } + } else { + let mut sub_size = 0; + for keyword in sub_keywords { + if let Some(name) = &keyword.arg { + self.emit_load_const(ConstantData::Str { + value: name.as_str().into(), + }); + self.compile_expression(&keyword.value)?; + sub_size += 1; + } + } + emit!(self, Instruction::BuildMap { size: sub_size }); + emit!(self, Instruction::DictMerge { index: 1 }); + } + }
🤖 Prompt for AI Agents
In `@crates/codegen/src/compile.rs` around lines 6983 - 6986, The DictMerge path
currently uses the first mapping as the accumulator which can mutate a
caller-provided **kwargs dict; change compile.rs so that when merging multiple
mappings (the loop around emit!(self, Instruction::DictMerge { index: 1 })), you
first allocate a fresh empty accumulator and merge each mapping into that fresh
map instead of using the first input as the target; specifically, before the
for-loop that emits Instruction::DictMerge, emit an instruction to create a new
empty dict (so the accumulator is new) and then merge all incoming `**` groups
into that fresh accumulator (update the code that emits Instruction::DictMerge
and the surrounding logic in the same block to use the new accumulator).