◐ Shell
clean mode source ↗

Add sys._stdlib_dir by youknowone · Pull Request #6798 · RustPython/RustPython

@youknowone

@youknowone youknowone commented

Jan 19, 2026

edited by coderabbitai Bot

Loading

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved standard library path resolution, particularly on Windows with enhanced lib directory detection for both text-file and symlink-based paths
  • Refactor

    • Centralized standard library directory configuration through interpreter state management

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

📝 Walkthrough

Walkthrough

Introduces a new stdlib_dir field to the path configuration system, computed via platform-specific logic in getpath.rs, exposed through the sys module as _stdlib_dir, and wired during interpreter initialization to store the standard library location derived from either frozen or dynamic paths.

Changes

Cohort / File(s) Summary
Path configuration structure
crates/vm/src/vm/setting.rs
Added optional stdlib_dir field to Paths struct to store the standard library directory path
stdlib directory computation
crates/vm/src/getpath.rs
Introduced private calculate_stdlib_dir() helper with platform-specific logic; added Step 9 to init_path_config() to assign computed stdlib_dir with existence validation
Python API exposure
crates/vm/src/stdlib/sys.rs
Added _stdlib_dir attribute to sys module that returns vm.state.config.paths.stdlib_dir as a PyObjectRef
Interpreter initialization
src/interpreter.rs
Replaced vm.sys_module hack with PyRc-based state access; centralized stdlib_dir assignment for both frozen and dynamic stdlib path flows
Windows build configuration
crates/pylib/build.rs
Enhanced Lib path resolution to handle both symlinks and text files (git with/without symlink support); added canonicalization and \?\ prefix stripping

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested reviewers

  • coolreader18
  • arihant2math

Poem

🐰 A rabbit hops through stdlib ways,
Building paths through winding maze,
From frozen lib to dynamic ground,
stdlib_dir must now be found! 📚✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Add sys._stdlib_dir' directly reflects the main change: adding a new Python-accessible attribute _stdlib_dir to the sys module, which is the primary public-facing feature of this changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin stdlib_dir

coderabbitai[bot]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/pylib/build.rs`:
- Around line 14-31: The build script currently calls
canonicalize(...).to_str().unwrap(), which can panic on non‑UTF‑8 Windows paths;
change the emission to use a lossy conversion from the canonicalized Path (e.g.,
Path::to_string_lossy on the value returned by std::fs::canonicalize(&lib_path))
and print that into the cargo:rustc-env=win_lib_path value, and if
std::fs::canonicalize(&lib_path) returns an Err emit a visible build warning via
println!("cargo:warning=...") instead of silently skipping so missing
win_lib_path is obvious; update the code paths referencing canonicalized_path
and to_str().unwrap() accordingly.

Comment on lines +14 to +31

if cfg!(windows) {
// On Windows, the Lib entry can be either:
// 1. A text file containing the relative path (git without symlink support)
// 2. A proper symlink (git with symlink support)
// We handle both cases to resolve to the actual Lib directory.
let lib_path = if let Ok(real_path) = std::fs::read_to_string("Lib") {
// Case 1: Text file containing relative path
std::path::PathBuf::from(real_path.trim())
} else {
// Case 2: Symlink or directory - canonicalize directly
std::path::PathBuf::from("Lib")
};

if let Ok(canonicalized_path) = std::fs::canonicalize(&lib_path) {
// Strip the extended path prefix (\\?\) that canonicalize adds on Windows
let path_str = canonicalized_path.to_str().unwrap();
let path_str = path_str.strip_prefix(r"\\?\").unwrap_or(path_str);
println!("cargo:rustc-env=win_lib_path={path_str}");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid panic on non‑UTF‑8 Windows paths.
to_str().unwrap() can panic on valid Windows paths containing non‑UTF‑8 sequences, breaking the build. Prefer a lossy conversion and consider warning when canonicalization fails so missing win_lib_path isn’t silent.

🔧 Proposed safer emission
-        if let Ok(canonicalized_path) = std::fs::canonicalize(&lib_path) {
-            // Strip the extended path prefix (\\?\) that canonicalize adds on Windows
-            let path_str = canonicalized_path.to_str().unwrap();
-            let path_str = path_str.strip_prefix(r"\\?\").unwrap_or(path_str);
-            println!("cargo:rustc-env=win_lib_path={path_str}");
-        }
+        if let Ok(canonicalized_path) = std::fs::canonicalize(&lib_path) {
+            // Strip the extended path prefix (\\?\) that canonicalize adds on Windows
+            let path_str = canonicalized_path.to_string_lossy();
+            let path_str = path_str.strip_prefix(r"\\?\").unwrap_or(&path_str);
+            println!("cargo:rustc-env=win_lib_path={path_str}");
+        } else {
+            println!("cargo:warning=failed to canonicalize Lib path: {:?}", lib_path);
+        }
🤖 Prompt for AI Agents
In `@crates/pylib/build.rs` around lines 14 - 31, The build script currently calls
canonicalize(...).to_str().unwrap(), which can panic on non‑UTF‑8 Windows paths;
change the emission to use a lossy conversion from the canonicalized Path (e.g.,
Path::to_string_lossy on the value returned by std::fs::canonicalize(&lib_path))
and print that into the cargo:rustc-env=win_lib_path value, and if
std::fs::canonicalize(&lib_path) returns an Err emit a visible build warning via
println!("cargo:warning=...") instead of silently skipping so missing
win_lib_path is obvious; update the code paths referencing canonicalized_path
and to_str().unwrap() accordingly.

coderabbitai[bot]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/codegen/src/compile.rs`:
- Around line 6983-6986: The DictMerge path currently uses the first mapping as
the accumulator which can mutate a caller-provided **kwargs dict; change
compile.rs so that when merging multiple mappings (the loop around emit!(self,
Instruction::DictMerge { index: 1 })), you first allocate a fresh empty
accumulator and merge each mapping into that fresh map instead of using the
first input as the target; specifically, before the for-loop that emits
Instruction::DictMerge, emit an instruction to create a new empty dict (so the
accumulator is new) and then merge all incoming `**` groups into that fresh
accumulator (update the code that emits Instruction::DictMerge and the
surrounding logic in the same block to use the new accumulator).
🧹 Nitpick comments (1)
crates/vm/src/frame.rs (1)

962-962: Inconsistent downcast pattern with DictUpdate.

DictUpdate (line 934) uses a safe downcast with .expect():

let dict = dict.downcast_ref::<PyDict>().expect("exact dict expected");

But DictMerge uses an unsafe downcast without a SAFETY comment explaining the invariant. Consider using the same safe pattern as DictUpdate for consistency and defensive coding, or at minimum add a SAFETY comment like the other unsafe downcasts in this file (e.g., ListAppend, ListExtend).

♻️ Suggested fix for consistency
-                let dict: &Py<PyDict> = unsafe { dict_ref.downcast_unchecked_ref() };
+                let dict: &Py<PyDict> = unsafe {
+                    // SAFETY: compiler guarantees correct type
+                    dict_ref.downcast_unchecked_ref()
+                };

Or use the safe pattern:

-                let dict: &Py<PyDict> = unsafe { dict_ref.downcast_unchecked_ref() };
+                let dict = dict_ref.downcast_ref::<PyDict>().expect("exact dict expected");

Comment on lines +6983 to +6986

// Merge all dicts: first dict is accumulator, merge rest into it
for _ in 1..size {
emit!(self, Instruction::DictMerge { index: 1 });
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid mutating user-provided **kwargs when merging.
DictMerge uses the first mapping as the accumulator. If the first group is **kwargs, this mutates the caller’s dict when multiple ** groups exist (observable vs CPython, which builds a fresh dict). Please ensure the accumulator is always a new map before merging any ** inputs.

🔧 Possible fix (merge into a fresh accumulator)
-        let mut size = 0;
-        let groupby = keywords.iter().chunk_by(|e| e.arg.is_none());
-        for (is_unpacking, sub_keywords) in &groupby {
-            if is_unpacking {
-                for keyword in sub_keywords {
-                    self.compile_expression(&keyword.value)?;
-                    size += 1;
-                }
-            } else {
-                let mut sub_size = 0;
-                for keyword in sub_keywords {
-                    if let Some(name) = &keyword.arg {
-                        self.emit_load_const(ConstantData::Str {
-                            value: name.as_str().into(),
-                        });
-                        self.compile_expression(&keyword.value)?;
-                        sub_size += 1;
-                    }
-                }
-                emit!(self, Instruction::BuildMap { size: sub_size });
-                size += 1;
-            }
-        }
-        if size > 1 {
-            // Merge all dicts: first dict is accumulator, merge rest into it
-            for _ in 1..size {
-                emit!(self, Instruction::DictMerge { index: 1 });
-            }
-        }
+        let mut have_accumulator = false;
+        let groupby = keywords.iter().chunk_by(|e| e.arg.is_none());
+        for (is_unpacking, sub_keywords) in &groupby {
+            if !have_accumulator {
+                emit!(self, Instruction::BuildMap { size: 0 }); // fresh accumulator
+                have_accumulator = true;
+            }
+            if is_unpacking {
+                for keyword in sub_keywords {
+                    self.compile_expression(&keyword.value)?;
+                    emit!(self, Instruction::DictMerge { index: 1 });
+                }
+            } else {
+                let mut sub_size = 0;
+                for keyword in sub_keywords {
+                    if let Some(name) = &keyword.arg {
+                        self.emit_load_const(ConstantData::Str {
+                            value: name.as_str().into(),
+                        });
+                        self.compile_expression(&keyword.value)?;
+                        sub_size += 1;
+                    }
+                }
+                emit!(self, Instruction::BuildMap { size: sub_size });
+                emit!(self, Instruction::DictMerge { index: 1 });
+            }
+        }
🤖 Prompt for AI Agents
In `@crates/codegen/src/compile.rs` around lines 6983 - 6986, The DictMerge path
currently uses the first mapping as the accumulator which can mutate a
caller-provided **kwargs dict; change compile.rs so that when merging multiple
mappings (the loop around emit!(self, Instruction::DictMerge { index: 1 })), you
first allocate a fresh empty accumulator and merge each mapping into that fresh
map instead of using the first input as the target; specifically, before the
for-loop that emits Instruction::DictMerge, emit an instruction to create a new
empty dict (so the accumulator is new) and then merge all incoming `**` groups
into that fresh accumulator (update the code that emits Instruction::DictMerge
and the surrounding logic in the same block to use the new accumulator).