◐ Shell
reader mode source ↗
Skip to content

Multi phase module init#6740

Merged
youknowone merged 5 commits into
RustPython:mainfrom
youknowone:multi-phase-module-init
Jan 22, 2026
Merged

Multi phase module init#6740
youknowone merged 5 commits into
RustPython:mainfrom
youknowone:multi-phase-module-init

Conversation

@youknowone

@youknowone youknowone commented Jan 16, 2026

Copy link
Copy Markdown
Member

Summary by CodeRabbit

  • New Features

    • Builder-based interpreter API for configurable construction and stdlib registration.
    • Multi-phase module initialization (create + exec) to support safer circular imports.
  • Refactor

    • Stdlib and native module bootstrapping reorganized to use module definitions and exec hooks across many modules.
    • Public initialization surface unified around the builder and module_def/module_exec pattern.
  • Behavior

    • Additional file metadata fields exposed on Windows.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Jan 16, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Introduce InterpreterBuilder and migrate stdlib/native module wiring from single‑step make_module to multi‑phase PyModuleDef APIs (module_def + module_exec). VM bootstrap now creates → inserts into sys.modules → exec (with rollback on failure). Callers updated to builder-based initialization and stdlib_module_defs(ctx).

Changes

Cohort / File(s) Summary
Interpreter & builder
crates/vm/src/vm/interpreter.rs, crates/vm/src/vm/mod.rs, crates/vm/src/lib.rs
Add InterpreterBuilder, builder API (settings, add_native_module(s), init_hook, with_frozen, build), initialize_main_vm; expose InterpreterBuilder; shift VM global state to module_defs.
Import & two‑phase init
crates/vm/src/import.rs, crates/vm/src/builtins/module.rs, crates/vm/src/builtins/mod.rs
Add PyModuleDef::create_module and exec_module; implement create → insert into sys.modules → exec with rollback on exec failure; update builtins init sequencing.
Stdlib registry
crates/stdlib/src/lib.rs, crates/vm/src/stdlib/mod.rs
Replace get_module_inits/StdlibMap with stdlib_module_defs(ctx) / builtin_module_defs(ctx) returning Vec<&'static PyModuleDef>; switch per-module wiring to module_def(ctx).
Bulk module rewiring
crates/stdlib/src/*, crates/vm/src/stdlib/*, crates/wasm/src/*
Many files: change pub(crate) use X::make_modulepub(crate) use X::module_def; remove top‑level make_module wrappers; add module_exec(vm,module) where module-specific runtime setup is required.
Stdlib modules with custom exec
examples in crates/stdlib and crates/vm/src/stdlib (e.g., array, contextvars, _sqlite3, ctypes, errno, io, socket, unicodedata, typing, etc.)
Add crate-visible module_exec wrappers that call generated __module_exec, perform registrations/platform setup, use ? error propagation, and re-export module_def.
Codegen changes (derive)
crates/derive-impl/src/pymodule.rs, crates/derive/src/lib.rs
Generator now emits module_def and module_exec wiring (rename has_extend_modulehas_module_exec), sets PyModuleDef exec slot to module_exec, and updates docs/examples.
Public API & host changes
src/interpreter.rs, src/lib.rs, src/main.rs, examples, benches, example_projects
Replace InterpreterConfig/InitHook with InterpreterBuilderExt::init_stdlib() and builder-based flows; update run() and examples/benches to use Interpreter::builder(...).add_native_modules(...).build() and stdlib_module_defs(&builder.ctx).
Minor exports & platform tweaks
crates/common/src/crt_fd.rs, crates/vm/src/builtins/mod.rs, crates/vm/src/builtins/module.rs, benches/examples
Small import/visibility tweaks (AsFd import, export PyModuleSlots), add create_module/exec_module helpers on PyModuleDef, and adjust call sites to use builder/context.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Builder as InterpreterBuilder (ctx)
    participant DefList as stdlib_module_defs(&ctx)
    participant BuilderActions as Builder.add_native_modules / build
    participant Interpreter
    participant VM as VirtualMachine
    participant Sys as sys.modules
    participant Def as PyModuleDef

    User->>Builder: Interpreter::builder(settings)
    Builder->>DefList: stdlib_module_defs(&builder.ctx)
    DefList-->>Builder: Vec<&PyModuleDef>
    User->>BuilderActions: add_native_modules(defs) / build()
    BuilderActions->>Interpreter: initialize_main_vm(...)
    Interpreter->>VM: construct VM with module_defs map
    loop for each builtin module_def
      VM->>Def: create_module(vm)           %% Phase 1: create
      Def-->>VM: PyModule instance
      VM->>Sys: insert module into sys.modules (pre-exec)
      VM->>Def: exec_module(vm, module)     %% Phase 2: exec
      Def-->>VM: module init completes or error
      alt exec failed
        VM->>Sys: remove partial module
      end
    end
    Interpreter-->>User: Interpreter (with global_state)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh
  • fanninpm
  • coolreader18

Poem

"🐇
I hop through bytes and stitch the start,
A builder seeds the interpreter's heart.
Modules born, inserted, then run —
create then exec till init is done.
A rabbit cheers: smooth flow, well spun."

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Multi phase module init' accurately describes the primary change in this changeset, which implements a multi-phase module initialization system replacing the previous single-phase approach.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone force-pushed the multi-phase-module-init branch 3 times, most recently from 660f2fb to 054b125 Compare January 16, 2026 06:32
@github-actions

Copy link
Copy Markdown
Contributor

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin multi-phase-module-init

@youknowone youknowone force-pushed the multi-phase-module-init branch 2 times, most recently from 338374a to 5b2598b Compare January 16, 2026 06:51
@youknowone youknowone force-pushed the multi-phase-module-init branch 17 times, most recently from 85d072d to ea3a345 Compare January 21, 2026 12:11
@youknowone youknowone force-pushed the multi-phase-module-init branch from ea3a345 to 5eed597 Compare January 21, 2026 13:39
9 hidden items Load more…
@youknowone youknowone force-pushed the multi-phase-module-init branch from 5eed597 to c6dc3d2 Compare January 21, 2026 14:07
@youknowone youknowone marked this pull request as ready for review January 21, 2026 14:23
@youknowone youknowone force-pushed the multi-phase-module-init branch from c6dc3d2 to 7a38e56 Compare January 21, 2026 14:35

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/stdlib/os.rs (1)

1900-1923: Avoid panics in module_exec; propagate errors instead.

unwrap() during import can abort the interpreter. Returning PyResult<()> and using ? provides a proper Python exception instead. If you adopt this, update callers to handle the new result.

🛠️ Proposed fix
-pub fn module_exec(vm: &VirtualMachine, module: &Py<PyModule>) {
+pub fn module_exec(vm: &VirtualMachine, module: &Py<PyModule>) -> PyResult<()> {
     let support_funcs = _os::support_funcs();
     let supports_fd = PySet::default().into_ref(&vm.ctx);
     let supports_dir_fd = PySet::default().into_ref(&vm.ctx);
     let supports_follow_symlinks = PySet::default().into_ref(&vm.ctx);
     for support in support_funcs {
-        let func_obj = module.get_attr(support.name, vm).unwrap();
+        let func_obj = module.get_attr(support.name, vm)?;
         if support.fd.unwrap_or(false) {
-            supports_fd.clone().add(func_obj.clone(), vm).unwrap();
+            supports_fd.clone().add(func_obj.clone(), vm)?;
         }
         if support.dir_fd.unwrap_or(false) {
-            supports_dir_fd.clone().add(func_obj.clone(), vm).unwrap();
+            supports_dir_fd.clone().add(func_obj.clone(), vm)?;
         }
         if support.follow_symlinks.unwrap_or(false) {
-            supports_follow_symlinks.clone().add(func_obj, vm).unwrap();
+            supports_follow_symlinks.clone().add(func_obj, vm)?;
         }
     }

     extend_module!(vm, module, {
         "supports_fd" => supports_fd,
         "supports_dir_fd" => supports_dir_fd,
         "supports_follow_symlinks" => supports_follow_symlinks,
         "error" => vm.ctx.exceptions.os_error.to_owned(),
     });
+    Ok(())
 }
🤖 Fix all issues with AI agents
In `@crates/vm/src/import.rs`:
- Around line 91-116: The module is inserted into sys_modules via
sys_modules.set_item(...) before def.exec_module(...) and if exec_module fails
the partially-initialized module remains in sys_modules; update the two-phase
init path (the block that uses vm.state.module_defs, def.create_module,
sys_modules.set_item, def.exec_module) to remove the module from sys_modules on
exec failure (call sys_modules.del_item(module_name, vm) or equivalent) before
returning the error so subsequent imports won't return the broken module,
ensuring the original error from def.exec_module is propagated by returning
Err(e).

In `@crates/vm/src/stdlib/errno.rs`:
- Around line 9-23: The module_exec function currently uses unwrap() when
calling errorcode.set_item(...) and module.set_attr(...), which will panic on
failure; change those unwrap() calls to propagate errors by using the ? operator
so the function returns a Python exception instead of aborting. Locate
module_exec (and the loop over super::ERROR_CODES) and replace the two .unwrap()
usages on set_item and set_attr with ? so the PyResult<()> return handles
failures; ensure signatures remain unchanged and error propagation compiles.

In `@crates/vm/src/stdlib/imp.rs`:
- Around line 122-149: When inserting the module into sys.modules before calling
the def.slots.exec slot (module variable, sys_modules.set_item and
def.slots.exec), ensure you remove that entry if exec returns an Err so a
partially-initialized module is not left cached; change the exec call to
match/if-let on its Result, and on Err call sys_modules.del_item(&*name, vm) (or
the equivalent removal API) before propagating the error so behavior matches
CPython’s cleanup on exec failure.

In `@crates/vm/src/stdlib/mod.rs`:
- Around line 96-104: The posix::module_def(ctx) is being registered twice for
WASI because both #[cfg(target_os = "wasi")] and the broader
#[cfg(all(any(not(target_arch = "wasm32"), target_os = "wasi"), not(any(unix,
windows))))] match; remove the redundant explicit #[cfg(target_os = "wasi")]
posix::module_def(ctx) entry so posix::module_def(ctx) is only included once
(leave the broader cfg that covers non-unix/windows and WASI targets or vice
versa based on project convention), ensuring only a single
posix::module_def(ctx) call remains in the builtin modules list.

In `@crates/vm/src/vm/interpreter.rs`:
- Around line 30-142: The code in initialize_main_vm currently calls Box::leak
for sysconfigdata_name on every VM creation, leaking memory each time; change
this to leak the platform-specific name only once per process by introducing a
static OnceLock/OnceCell (e.g., static SYSCONFIGDATA_LEAK: OnceLock<&'static
str>) and use get_or_init to perform Box::leak only the first time, then insert
the returned &'static str into all_module_defs; reference initialize_main_vm,
sysconfigdata_name, and the all_module_defs insertion where leaked_name is
created to locate the change.
🧹 Nitpick comments (5)
crates/wasm/src/js_module.rs (1)

623-623: LGTM - Re-export added for module_def pattern.

The addition of pub(crate) use _js::module_def; aligns with the crate-wide module initialization refactoring. Note that this re-export is placed at the end of the file, unlike most other modules where it appears at the top. While functionally equivalent, you may consider moving it to line 2 (after the PyJsValue, PyPromise re-export) for consistency with other stdlib modules.

crates/vm/src/builtins/module.rs (1)

53-71: Remove redundant import.

PyPayload is already imported at the file level (line 3), so the use crate::PyPayload; on line 54 is unnecessary.

Proposed fix
     pub fn create_module(&'static self, vm: &VirtualMachine) -> PyResult<PyRef<PyModule>> {
-        use crate::PyPayload;
-
         // Create module (use create slot if provided, else default creation)
         let module = if let Some(create) = self.slots.create {
crates/stdlib/src/_sqlite3.rs (1)

847-867: Propagate setup_module_exceptions errors instead of panicking.

Line 856 invokes setup_module_exceptions, which still uses unwrap() internally. Since module_exec now returns PyResult<()>, consider returning PyResult<()> from setup_module_exceptions and using ? to avoid panics during module init.

crates/vm/src/stdlib/sysconfigdata.rs (1)

14-25: Chain __module_exec before custom init.

Custom module_exec doesn’t call the macro-generated __module_exec, so any #[pyattr]/#[pyfunction] exports added now or later won’t be registered. Consider calling it first, consistent with other stdlib modules.

♻️ Suggested change
     fn module_exec(vm: &VirtualMachine, module: &Py<PyModule>) -> PyResult<()> {
+        __module_exec(vm, module);
         // Set build_time_vars attribute
         let build_time_vars = build_time_vars(vm);
crates/vm/src/stdlib/sys.rs (1)

168-173: Consider deduplicating builtin module names after augmentation.

If any native module definition already uses "sys" or "builtins", this could yield duplicates in sys.builtin_module_names. A quick dedup after sorting prevents that edge case.

♻️ Proposed tweak
-        module_names.sort();
+        module_names.sort();
+        module_names.dedup();

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/stdlib/sys.rs (1)

1678-1685: Revert visibility change for sysconfigdata_name.

The function is only called from within crates/vm/src/ (via crate::stdlib::sys::sysconfigdata_name() in interpreter.rs and sysconfigdata.rs), so pub(crate) visibility is sufficient. No external crates use this function. Change it back to pub(crate) to maintain proper encapsulation.

♻️ Duplicate comments (1)
crates/vm/src/vm/interpreter.rs (1)

67-72: Avoid per-interpreter Box::leak for _sysconfigdata aliasing.

This repeats a previously flagged issue: leaking on each interpreter creation grows memory unboundedly. Prefer a process-wide OnceLock to leak once.

🩹 Suggested fix (same as prior review)
     if let Some(&sysconfigdata_def) = all_module_defs.get("_sysconfigdata") {
+        use std::sync::OnceLock;
         let sysconfigdata_name = crate::stdlib::sys::sysconfigdata_name();
         // Leak the string to get a 'static lifetime for the key
-        let leaked_name: &'static str = Box::leak(sysconfigdata_name.into_boxed_str());
-        all_module_defs.insert(leaked_name, sysconfigdata_def);
+        static SYSCONFIGDATA_NAME: OnceLock<&'static str> = OnceLock::new();
+        let leaked_name =
+            *SYSCONFIGDATA_NAME.get_or_init(|| Box::leak(sysconfigdata_name.into_boxed_str()));
+        all_module_defs.insert(leaked_name, sysconfigdata_def);
     }

@youknowone youknowone force-pushed the multi-phase-module-init branch from 7a38e56 to 203756d Compare January 21, 2026 14:54
@youknowone youknowone marked this pull request as draft January 21, 2026 15:00
@youknowone youknowone force-pushed the multi-phase-module-init branch 2 times, most recently from b25c907 to 22ea6ad Compare January 21, 2026 15:14
@youknowone youknowone marked this pull request as ready for review January 21, 2026 23:31
@youknowone youknowone force-pushed the multi-phase-module-init branch from 22ea6ad to bed12ed Compare January 21, 2026 23:58
@github-actions

github-actions Bot commented Jan 21, 2026

Copy link
Copy Markdown
Contributor

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

Click to expand dependency information
[+] lib: cpython/Lib/token.py
soft_deps:
- [ ] token

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated
  • native: Rust/C extension modules

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/vm/mod.rs (1)

1339-1339: Update test to use builder pattern for consistency.

test_nested_frozen (line 1314) uses Interpreter::builder(), but frozen_origname_matches (line 1339) still uses Interpreter::with_init(). Update the latter to match the builder pattern for consistency with the rest of the test suite.

🤖 Fix all issues with AI agents
In `@crates/stdlib/src/_sqlite3.rs`:
- Around line 847-867: The call to __module_exec(vm, module) in module_exec
currently discards its Result; change the invocation to propagate errors (e.g.,
use __module_exec(vm, module)? or otherwise handle its PyResult) so that any
error returned by __module_exec is forwarded from module_exec; ensure
module_exec still returns PyResult<()> and that subsequent initialization
(ERROR_CODES loop, setup_module_exceptions,
CONVERTERS/ADAPTERS/USER_FUNCTION_EXCEPTION/ENABLE_TRACEBACK setup, and
module.set_attr calls) only runs after successful __module_exec.

In `@crates/vm/src/stdlib/ctypes.rs`:
- Around line 1281-1312: The call to __module_exec(vm, module) in module_exec
currently discards its PyResult return value; change the call to propagate
errors by using the ? operator (i.e., call __module_exec(vm, module)?;) so that
module_exec (which returns PyResult<()>) will return any error from
__module_exec instead of silently ignoring it.

In `@crates/vm/src/stdlib/typing.rs`:
- Around line 191-218: The call to __module_exec(vm, module) inside module_exec
currently discards its PyResult, so initialization errors are ignored; change
the call to propagate errors (e.g., use the ? operator) so that module_exec
returns early with the error when __module_exec fails—update the invocation in
the module_exec function to propagate the result rather than ignoring it.

In `@crates/wasm/src/vm_class.rs`:
- Around line 29-35: In module_exec, the result of __module_exec(vm, module) is
ignored so errors don't propagate; change the call to use the ? operator (i.e.,
__module_exec(vm, module)?;) so any PyErr returned by __module_exec is
propagated, then proceed to extend_module! and return Ok(()) from module_exec;
this ensures module initialization failures are handled consistently.
♻️ Duplicate comments (2)
crates/vm/src/stdlib/imp.rs (1)

143-149: Clean up sys.modules when exec slot fails.

This was flagged in a previous review: if def.slots.exec returns an error on line 148, the module inserted at line 144 remains in sys.modules, leaving a partially initialized module cached. This can cause issues with subsequent imports.

🔧 Suggested fix
             // Add to sys.modules BEFORE exec (critical for circular import handling)
             sys_modules.set_item(&*name, module.clone().into(), vm)?;

             // Phase 2: Call exec slot (can safely import other modules now)
             if let Some(exec) = def.slots.exec {
-                exec(vm, &module)?;
+                if let Err(err) = exec(vm, &module) {
+                    let _ = sys_modules.del_item(&*name, vm);
+                    return Err(err);
+                }
             }
crates/vm/src/stdlib/mod.rs (1)

96-102: Potential duplicate posix module registration on WASI targets.

The cfg conditions may still overlap for WASI:

  • Line 96-97: #[cfg(any(unix, target_os = "wasi"))] matches WASI
  • Lines 98-101: #[cfg(all(any(not(target_arch = "wasm32"), target_os = "wasi"), not(any(unix, windows))))] also matches WASI since target_os = "wasi" satisfies the first part and WASI is neither unix nor windows

This could cause posix::module_def(ctx) to be included twice in the vector for WASI builds.

#!/bin/bash
# Verify the cfg conditions for WASI target
# Check if both conditions can be true simultaneously for wasi

echo "Checking posix module cfg conditions in mod.rs..."
rg -n "posix::module_def" crates/vm/src/stdlib/mod.rs -B2 -A1

echo ""
echo "Checking how other files handle WASI posix module:"
rg -n "target_os.*wasi" crates/vm/src/stdlib/mod.rs
🧹 Nitpick comments (4)
crates/vm/src/vm/interpreter.rs (1)

438-474: Consider tracking the FIXME about duplicate frozen core_modules.
If you’d like, I can help draft a follow-up to avoid double-generation while keeping without_stdlib() and init_stdlib() consistent.

crates/derive-impl/src/pymodule.rs (1)

56-90: Consider guarding def.slots.exec assignment or documenting that module_exec must not be cfg-gated.

The detection at lines 86-90 sets has_module_exec = true based on the function name alone, without checking func.attrs for #[cfg] guards. While no cfg-gated module_exec functions currently exist in the codebase (all use conditional code inside the function body instead), the implementation remains fragile: if a user adds a top-level #[cfg]-gated module_exec, the fallback suppression (line 149) combined with the unconditional reference at line 142 would cause silent build failures on excluded configurations.

Prevent this by either checking cfg attributes during detection, conditionally wrapping the def.slots.exec assignment, or documenting that module_exec cannot be cfg-gated at the function level.

crates/vm/src/stdlib/os.rs (1)

1900-1924: Consider error propagation in module_exec.

The function uses multiple unwrap() calls (lines 1906, 1908, 1911, 1914) that could panic during module initialization if get_attr or add fails. While the current callers in posix.rs and nt.rs don't handle errors from this function, propagating errors would be more consistent with the pattern in errno.rs which uses ? for error handling.

Given the callers currently ignore the result anyway (they just call os::module_exec(vm, module);), this is a lower-priority refactor that could be addressed in a follow-up.

♻️ Suggested change for error propagation
-pub fn module_exec(vm: &VirtualMachine, module: &Py<PyModule>) {
+pub fn module_exec(vm: &VirtualMachine, module: &Py<PyModule>) -> PyResult<()> {
     let support_funcs = _os::support_funcs();
     let supports_fd = PySet::default().into_ref(&vm.ctx);
     let supports_dir_fd = PySet::default().into_ref(&vm.ctx);
     let supports_follow_symlinks = PySet::default().into_ref(&vm.ctx);
     for support in support_funcs {
-        let func_obj = module.get_attr(support.name, vm).unwrap();
+        let func_obj = module.get_attr(support.name, vm)?;
         if support.fd.unwrap_or(false) {
-            supports_fd.clone().add(func_obj.clone(), vm).unwrap();
+            supports_fd.clone().add(func_obj.clone(), vm)?;
         }
         if support.dir_fd.unwrap_or(false) {
-            supports_dir_fd.clone().add(func_obj.clone(), vm).unwrap();
+            supports_dir_fd.clone().add(func_obj.clone(), vm)?;
         }
         if support.follow_symlinks.unwrap_or(false) {
-            supports_follow_symlinks.clone().add(func_obj, vm).unwrap();
+            supports_follow_symlinks.clone().add(func_obj, vm)?;
         }
     }
     // ... extend_module! ...
+    Ok(())
 }

This would require updating callers in posix.rs, nt.rs, and posix_compat.rs to propagate the error.

crates/vm/src/vm/mod.rs (1)

231-232: Minor grammar issue in error message.

Consider: "rustpython_pylib may not be set" instead of "maybe not set" for grammatical correctness.

Suggested fix
             let guide_message = if cfg!(feature = "freeze-stdlib") {
-                "`rustpython_pylib` maybe not set while using `freeze-stdlib` feature. Try using `rustpython::InterpreterBuilder::init_stdlib` or manually call `builder.add_frozen_modules(rustpython_pylib::FROZEN_STDLIB)` in `rustpython_vm::Interpreter::builder()`."
+                "`rustpython_pylib` may not be set while using `freeze-stdlib` feature. Try using `rustpython::InterpreterBuilder::init_stdlib` or manually call `builder.add_frozen_modules(rustpython_pylib::FROZEN_STDLIB)` in `rustpython_vm::Interpreter::builder()`."

@youknowone youknowone force-pushed the multi-phase-module-init branch from bed12ed to 04a9da3 Compare January 22, 2026 00:31

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/stdlib/os.rs (1)

1900-1923: Replace unwrap() calls in os::module_exec with ? operator; change return type to PyResult<()> and update callers.

The current module_exec function panics on get_attr() or add() failures (OOM, missing attributes). Change the return type to PyResult<()>, replace the four unwrap() calls with ? operators, and update the three callers in posix.rs, nt.rs, and posix_compat.rs to use ? when calling super::super::os::module_exec(vm, module).

🔧 Proposed fix
-pub fn module_exec(vm: &VirtualMachine, module: &Py<PyModule>) {
+pub fn module_exec(vm: &VirtualMachine, module: &Py<PyModule>) -> PyResult<()> {
     let support_funcs = _os::support_funcs();
     let supports_fd = PySet::default().into_ref(&vm.ctx);
     let supports_dir_fd = PySet::default().into_ref(&vm.ctx);
     let supports_follow_symlinks = PySet::default().into_ref(&vm.ctx);
     for support in support_funcs {
-        let func_obj = module.get_attr(support.name, vm).unwrap();
+        let func_obj = module.get_attr(support.name, vm)?;
         if support.fd.unwrap_or(false) {
-            supports_fd.clone().add(func_obj.clone(), vm).unwrap();
+            supports_fd.clone().add(func_obj.clone(), vm)?;
         }
         if support.dir_fd.unwrap_or(false) {
-            supports_dir_fd.clone().add(func_obj.clone(), vm).unwrap();
+            supports_dir_fd.clone().add(func_obj.clone(), vm)?;
         }
         if support.follow_symlinks.unwrap_or(false) {
-            supports_follow_symlinks.clone().add(func_obj, vm).unwrap();
+            supports_follow_symlinks.clone().add(func_obj, vm)?;
         }
     }

     extend_module!(vm, module, {
         "supports_fd" => supports_fd,
         "supports_dir_fd" => supports_dir_fd,
         "supports_follow_symlinks" => supports_follow_symlinks,
         "error" => vm.ctx.exceptions.os_error.to_owned(),
     });
+    Ok(())
 }

Update callers:

  • posix.rs: super::super::os::module_exec(vm, module)?;
  • nt.rs: super::super::os::module_exec(vm, module)?;
  • posix_compat.rs: super::super::os::module_exec(vm, module)?;

This aligns with the pattern used in other stdlib modules (sysconfigdata, errno, io, signal) that return PyResult<()> and propagate errors rather than panicking during initialization.

🤖 Fix all issues with AI agents
In `@crates/derive-impl/src/pymodule.rs`:
- Around line 85-90: The derive currently marks context.has_module_exec = true
when it sees an Item::Fn named module_exec but doesn't handle cfg-gated
declarations, which leads to emitting def.slots.exec = Some(module_exec)
unconditionally; update the visitor in pymodule.rs to detect if the
function/item named module_exec has any #[cfg(...)] attributes (inspect
func.attrs for a Meta path "cfg") and, if so, emit a compile_error (or return a
syn::Error) rejecting cfg‑gated module_exec with a clear message that the cfg
must be moved to the containing module or a non-gated fallback provided, instead
of setting context.has_module_exec = true and generating def.slots.exec =
Some(module_exec).

In `@crates/stdlib/src/pyexpat.rs`:
- Around line 53-65: The submodules "model" and "errors" created in module_exec
(variables model and errors) are attached to the pyexpat module but not
registered in sys.modules with their qualified names; update module_exec to
insert entries for "pyexpat.model" and "pyexpat.errors" into the VM's
sys.modules mapping after creating the submodules and before returning (use
vm.sys_module or the VM API to access sys.modules), ensuring the Py<PyModule>
values for model and errors are stored under those qualified keys so imports and
introspection that expect sys.modules["pyexpat.model"] /
sys.modules["pyexpat.errors"] will work as in CPython.

In `@crates/vm/src/stdlib/ast/python.rs`:
- Around line 100-107: The call to __module_exec(...) currently ignores its
PyResult, letting initialization errors be dropped; change module_exec to
propagate the error from __module_exec (e.g. use the ? operator or return on
Err) before calling super::super::pyast::extend_module_nodes so that failures in
__module_exec are returned as Err(PyErr) instead of always returning Ok(()).
Ensure you reference the existing functions __module_exec and module_exec and
preserve the subsequent call to extend_module_nodes only after successful
__module_exec.

In `@crates/vm/src/stdlib/sys.rs`:
- Around line 168-174: The returned tuple from builtin_module_names may contain
duplicate "sys" or "builtins" entries; in the builtin_module_names function,
after collecting and pushing module names (module_names) and sorting, remove
duplicates (e.g., dedupe the Vec<String> in-place) before converting to a
PyTupleRef so the exported tuple contains unique module names only.
♻️ Duplicate comments (5)
crates/wasm/src/vm_class.rs (1)

24-35: Propagate __module_exec errors (already noted earlier).
Line 30 ignores the PyResult from __module_exec, so init errors won’t propagate.

crates/vm/src/stdlib/typing.rs (1)

191-218: Propagate __module_exec errors.

Same issue: __module_exec(vm, module); discards errors; use ? to avoid silent init failures.

crates/vm/src/stdlib/ctypes.rs (1)

1281-1311: Propagate __module_exec errors.

Same issue: ignoring the Result hides init failures.

crates/vm/src/stdlib/imp.rs (1)

122-149: Remove sys.modules entry if exec fails.

If the exec slot errors, a partially initialized module remains cached. Clean it up before propagating the error to match CPython’s behavior.

🔧 Proposed fix
-            if let Some(exec) = def.slots.exec {
-                exec(vm, &module)?;
-            }
+            if let Some(exec) = def.slots.exec {
+                if let Err(err) = exec(vm, &module) {
+                    let _ = sys_modules.del_item(&*name, vm);
+                    return Err(err);
+                }
+            }
crates/vm/src/stdlib/mod.rs (1)

96-102: Avoid duplicate posix module registration on WASI.

Both cfgs can match for target_os = "wasi", causing posix::module_def(ctx) to be included twice. Consider narrowing the first entry to unix-only (or remove one entry).

🛠 Suggested fix
-        #[cfg(any(unix, target_os = "wasi"))]
+        #[cfg(unix)]
         posix::module_def(ctx),

@youknowone youknowone force-pushed the multi-phase-module-init branch from 04a9da3 to 1087d6c Compare January 22, 2026 01:15

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@crates/stdlib/src/openssl.rs`:
- Around line 100-108: module_exec currently ignores the PyResult returned by
__module_exec, masking initialization errors; change module_exec to propagate
that result instead of always returning Ok(()). After the existing
LazyLock::force call, call __module_exec(vm, module) and return its PyResult
(either via the ? operator or by returning the result directly) so any error
from __module_exec is forwarded to the caller.

In `@crates/vm/src/builtins/module.rs`:
- Around line 44-81: The create_module implementation must guard against create
slots returning modules with no def and pass a consistent spec object: when
self.slots.create is Some, build and pass a module spec object (not just a
string) compatible with imp.rs::create_builtin (i.e., an object with a name
attribute), then call the create slot; after creation, check the returned
module's def (module.def or module.def.is_some()) and only call
PyModule::__init_dict_from_def and module.__init_methods if the def is present;
otherwise skip those calls to avoid panics for modules created via
PyModule::new().

In `@crates/vm/src/stdlib/errno.rs`:
- Around line 9-22: The module_exec function currently calls __module_exec(vm,
module) and discards its PyResult, which can hide initialization errors; change
the call to propagate errors by using the ? operator (i.e., __module_exec(vm,
module)?;) so module_exec returns early with the underlying PyErr when
__module_exec fails, keeping the function's PyResult return type semantics
intact.

In `@crates/vm/src/stdlib/time.rs`:
- Around line 577-588: The module initialization currently calls
__module_exec(vm, module) but discards its PyResult, masking failures; update
module_exec to propagate errors by returning the result of __module_exec(vm,
module) (i.e., replace the ignored call + Ok(()) with returning __module_exec's
PyResult), preserving the existing unsafe c_tzset call and signature of
module_exec (working with VirtualMachine and Py<crate::builtins::PyModule>).
♻️ Duplicate comments (7)
crates/stdlib/src/pyexpat.rs (1)

42-66: Register pyexpat submodules in sys.modules for qualified imports

The module still attaches model/errors as attributes without registering pyexpat.model/pyexpat.errors in sys.modules, which can break qualified imports and introspection.

crates/vm/src/stdlib/imp.rs (1)

118-149: Remove partially initialized modules on exec failure.
If exec errors, the module stays in sys.modules and may be reused in a bad state.

🛠️ Suggested fix
-            if let Some(exec) = def.slots.exec {
-                exec(vm, &module)?;
-            }
+            if let Some(exec) = def.slots.exec {
+                if let Err(err) = exec(vm, &module) {
+                    let _ = sys_modules.del_item(&*name, vm);
+                    return Err(err);
+                }
+            }
crates/vm/src/stdlib/typing.rs (1)

191-218: Propagate __module_exec failures.
The result is dropped; initialization errors can be silently ignored.

🛠️ Suggested fix
-        __module_exec(vm, module);
+        __module_exec(vm, module)?;
crates/vm/src/stdlib/ctypes.rs (1)

1281-1312: Propagate __module_exec failures.
The result is dropped; initialization errors can be silently ignored.

🛠️ Suggested fix
-        __module_exec(vm, module);
+        __module_exec(vm, module)?;
crates/vm/src/stdlib/sys.rs (1)

168-174: Duplicate: dedupe builtin_module_names after sorting.
module_defs can already include "sys"/"builtins", so the tuple may contain duplicates.

crates/derive-impl/src/pymodule.rs (1)

85-90: Duplicate: reject cfg‑gated module_exec to avoid missing symbol.

crates/vm/src/stdlib/mod.rs (1)

96-102: Duplicate posix module registration on WASI targets.

On WASI targets, both conditions match:

  • #[cfg(any(unix, target_os = "wasi"))] (line 96) matches because target_os = "wasi"
  • #[cfg(all(any(not(target_arch = "wasm32"), target_os = "wasi"), not(any(unix, windows))))] (lines 98-101) also matches because WASI is wasm32 but has target_os = "wasi", and WASI is neither unix nor windows

This results in posix::module_def(ctx) being included twice in the vector for WASI builds, which could cause issues during module registration.

🔧 Suggested fix
         #[cfg(any(unix, target_os = "wasi"))]
         posix::module_def(ctx),
         #[cfg(all(
             any(not(target_arch = "wasm32"), target_os = "wasi"),
-            not(any(unix, windows))
+            not(any(unix, windows, target_os = "wasi"))
         ))]
         posix::module_def(ctx),

Alternatively, remove the WASI-specific entry since it's already covered:

-        #[cfg(any(unix, target_os = "wasi"))]
-        posix::module_def(ctx),
+        #[cfg(unix)]
+        posix::module_def(ctx),
+        #[cfg(target_os = "wasi")]
+        posix::module_def(ctx),
         #[cfg(all(
             any(not(target_arch = "wasm32"), target_os = "wasi"),
             not(any(unix, windows))
         ))]
         posix::module_def(ctx),
🧹 Nitpick comments (2)
src/interpreter.rs (2)

30-37: Potential panic if state has multiple references.

The .unwrap() on PyRc::get_mut will panic if there are other references to vm.state. While this is likely safe during initialization (when init_hook runs), consider using expect() with a descriptive message to aid debugging if assumptions are violated.

-    let state = PyRc::get_mut(&mut vm.state).unwrap();
+    let state = PyRc::get_mut(&mut vm.state)
+        .expect("VM state should be uniquely owned during init_hook");

58-76: Potential panic on non-UTF-8 paths.

The .unwrap() on into_string() (line 64) will panic if BUILDTIME_RUSTPYTHONPATH contains non-UTF-8 characters. While unlikely in controlled build environments, consider using to_string_lossy() or handling the error gracefully:

-            .map(|path| path.into_os_string().into_string().unwrap())
+            .filter_map(|path| path.into_os_string().into_string().ok())

Alternatively, if non-UTF-8 paths should cause a build/runtime failure, document this expectation.

@youknowone youknowone force-pushed the multi-phase-module-init branch from 1087d6c to 2037645 Compare January 22, 2026 02:21

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/vm/src/stdlib/io.rs`:
- Around line 4732-4744: The fileio module is missing a pub(crate) module_exec
function required by the call super::fileio::module_exec(vm, module)?; — add a
minimal wrapper with the signature pub(crate) fn module_exec(vm:
&VirtualMachine, module: &Py<PyModule>) -> PyResult<()> that calls the
autogenerated __module_exec(vm, module) and returns Ok(()), so the call site
compiles on non-WASM platforms; reference the __module_exec symbol inside the
new fileio::module_exec.
♻️ Duplicate comments (7)
crates/vm/src/stdlib/ctypes.rs (1)

1281-1288: Propagate errors from __module_exec.

Line 1287 discards the PyResult return; errors will be silently ignored.

🔧 Suggested fix
-        __module_exec(vm, module);
+        __module_exec(vm, module)?;
crates/stdlib/src/pyexpat.rs (1)

53-63: Register pyexpat.model / pyexpat.errors in sys.modules for qualified imports.

crates/vm/src/stdlib/imp.rs (1)

143-149: Rollback sys.modules if exec fails.

Line 143 inserts the module before exec; if exec errors, the partially initialized module remains cached. Clean it up on failure to avoid stale imports.

🛠️ Proposed fix
-            if let Some(exec) = def.slots.exec {
-                exec(vm, &module)?;
-            }
+            if let Some(exec) = def.slots.exec {
+                if let Err(err) = exec(vm, &module) {
+                    let _ = sys_modules.del_item(&*name, vm);
+                    return Err(err);
+                }
+            }
crates/derive-impl/src/pymodule.rs (1)

85-90: Guard against cfg‑gated module_exec to avoid build breaks.
Line 85–90 marks any module_exec as present, but def.slots.exec = Some(module_exec) is always emitted. If a user #[cfg]‑gates module_exec, builds with the cfg disabled will fail due to an unresolved symbol. This was previously flagged; consider rejecting cfg‑gated module_exec or requiring a non‑gated fallback.

🛠️ Proposed fix
-        if let Item::Fn(func) = item
-            && func.sig.ident == "module_exec"
-        {
-            context.has_module_exec = true;
-        }
+        if let Item::Fn(func) = item
+            && func.sig.ident == "module_exec"
+        {
+            let has_cfg = func.attrs.iter().any(|a| a.path().is_ident("cfg"));
+            if has_cfg {
+                context.errors.push(syn::Error::new_spanned(
+                    &func.sig.ident,
+                    "`module_exec` must not be cfg-gated; move the cfg to the module or provide a non-gated fallback",
+                ));
+            } else {
+                context.has_module_exec = true;
+            }
+        }
#!/bin/bash
# Find cfg-gated module_exec definitions to validate impact.
rg -n --type=rust '#\[cfg[^\]]*\]\s*\n\s*fn\s+module_exec' -C2
crates/vm/src/stdlib/mod.rs (1)

96-102: Avoid duplicate posix::module_def registration on WASI.
Both cfg blocks match on WASI, so posix::module_def(ctx) is added twice. Keep a single entry to prevent double registration.

🛠 Suggested fix
-        #[cfg(any(unix, target_os = "wasi"))]
-        posix::module_def(ctx),
+        #[cfg(unix)]
+        posix::module_def(ctx),
crates/wasm/src/vm_class.rs (1)

29-34: Propagate __module_exec errors.
The return value is ignored, so init failures won’t surface.

🔧 Suggested fix
-        __module_exec(vm, module);
+        __module_exec(vm, module)?;
crates/vm/src/stdlib/sys.rs (1)

168-174: Deduplicate sys.builtin_module_names entries.

module_defs may already contain sys/builtins, so pushing them unconditionally can introduce duplicates in the exported tuple. Add dedup() after sorting to ensure uniqueness.

🩹 Proposed fix
         module_names.sort();
+        module_names.dedup();
🧹 Nitpick comments (2)
crates/vm/src/stdlib/sysconfigdata.rs (1)

14-23: Optional: skip redundant sys.modules overwrite.

At Line 21–23, consider checking if sys.modules already contains the platform-specific key before setting it to avoid unnecessary mutation during reloads.

crates/vm/src/vm/interpreter.rs (1)

130-146: Consider returning global_state from the VM rather than cloning.

The comment on line 131 mentions "Don't clone here" but line 145 does clone vm.state. Since vm.state is already a PyRc<PyGlobalState>, the clone is cheap (just reference count bump). However, the function could simply return vm.state.clone() directly without the intermediate variable for clarity.

♻️ Optional simplification
     vm.initialize();

-    // Clone global_state for Interpreter after all initialization is done
-    let global_state = vm.state.clone();
-    (vm, global_state)
+    (vm, vm.state.clone())
 }

Hide details View details @youknowone youknowone merged commit 47df6dd into RustPython:main Jan 22, 2026
14 checks passed
@youknowone youknowone deleted the multi-phase-module-init branch January 22, 2026 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant