initial sandbox #7035

youknowone

Summary by CodeRabbit

New Features
- Introduced a host vs sandbox runtime mode: many stdlib modules (os, socket, signal, file I/O, ssl, etc.) are now available only in host mode; sandbox mode is restricted.
- Added a lightweight sandbox stdio implementation for non-host environments.
Tests / Chores
- CI now validates sandbox builds and runs a new sandbox smoke test to verify runtime behavior.

coderabbitai

📝 Walkthrough

Walkthrough

Adds a new Cargo feature flag host_env; gates many host-specific stdlib modules, file-execution, signal and stdio behavior behind it; centralizes OSError construction via an OSErrorBuilder path; introduces SandboxStdio for non-host builds; adds CI sandbox checks and a sandbox smoke test.

Changes

Cohort / File(s)	Summary
Workspace & Crate Features `Cargo.toml`, `crates/vm/Cargo.toml`, `crates/stdlib/Cargo.toml`	Introduce `host_env` feature and add it to default feature sets.
Stdlib Module Gating `crates/stdlib/src/lib.rs`, `crates/vm/src/stdlib/mod.rs`, `crates/vm/src/stdlib/os.rs`, `crates/vm/src/stdlib/io.rs`, `crates/vm/src/stdlib/sys.rs`, `crates/vm/src/stdlib/thread.rs`	Replace many platform-only cfgs with `feature = "host_env"` (often combined with OS checks); add/remove module gates and adjust exported module_def entries; add SandboxStdio-related sys changes.
VM Core & Initialization `crates/vm/src/vm/mod.rs`, `crates/vm/src/vm/python_run.rs`, `crates/vm/src/lib.rs`, `src/lib.rs`	Gate file-run and stdio construction behind `host_env`; provide sandbox fallback stdio; move file-execution helpers into a host-gated `file_run` module; adjust importlib initialization gating.
Exceptions & Conversions `crates/vm/src/exceptions.rs`, `crates/vm/src/stdlib/os.rs`, `crates/vm/src/stdlib/io.rs`	Centralize OS error → Python exception construction using a new OSErrorBuilder and ToOSErrorBuilder paths; add IntoPyException/ToPyException impls funneling through builder; remove duplicate/older impls from stdlib modules.
Import System `crates/vm/src/import.rs`	Add `#[cfg(feature = "host_env")]` `init_importlib_package` and move TryFromObject/PyListRef imports/usage into the host-gated function; make OS builtin import unconditional.
Signal & Interrupts `crates/vm/src/signal.rs`	Tighten cfgs for `clear_after_fork` and `set_interrupt_ex` to require `feature = "host_env"` alongside existing platform gates.
Testing & CI & Misc `extra_tests/snippets/sandbox_smoke.py`, `.github/workflows/ci.yaml`, `.cspell.json`	Add a sandbox smoke test, CI sandbox compilation and smoke-test steps (Linux-only), and add `"rustix"` to cspell dictionary.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as CLI / Binary
  participant VM as VirtualMachine
  participant FS as Filesystem
  participant Loader as Importlib/Loader
  participant StdLib as stdlib modules

  Note over CLI,VM: host_env enabled?
  CLI->>VM: run_any_file(path) (host_env)
  VM->>FS: stat/read path
  FS-->>VM: path type (.py / dir / .pyc)
  VM->>Loader: choose loader / set_main_loader
  Loader-->>VM: code object
  VM->>StdLib: initialize modules (host_env gated)
  VM->>VM: execute code
  VM-->>CLI: exit status

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Integrate OSError creations into OSErrorBuilder #6443 — Overlapping refactor of OSErrorBuilder / ToOSErrorBuilder and exception construction.
pythonrun #6643 — Overlaps in relocating run_any_file/run_script and file-execution helpers in python_run.rs.
use posix error messages under 127 #6420 — Related changes to error→Python-exception conversions in stdlib/io.rs.

Suggested reviewers

fanninpm
coolreader18

Poem

🐇 I hopped into code with a curious grin,
gates named host_env let host modules in.
Builders craft OSErrors, sandbox sleeps yet sings,
CI runs the smoke test, and tiny joy it brings.
✨ — a rabbit’s thump for neat, gated things.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The PR title 'initial sandbox' is vague and generic. While sandbox functionality is implemented throughout, the title lacks specificity about what 'initial sandbox' means—it could refer to a feature gate, a new sandbox mode, or foundational infrastructure.	Consider revising to a more descriptive title such as 'Add host_env feature gate for sandbox mode' or 'Introduce sandbox execution environment' to clarify the primary change.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions

Code has been automatically formatted

The code in this PR has been formatted using:

ruff check --select I --fix
Please pull the latest changes before pushing again:

git pull origin sandbox

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/vm/src/exceptions.rs (1)

2148-2158: ⚠️ Potential issue | 🟡 Minor

Setter allows storing sentinel value -1, causing subsequent access to raise AttributeError.

If Python code sets exc.characters_written = -1, the setter stores it without validation. The getter then treats -1 as a sentinel for "not set" and raises AttributeError. This diverges from CPython, where any explicitly set integer value (including -1) is retrievable as-is. The setter should validate that input values differ from the sentinel before storing.

🤖 Fix all issues with AI agents

In `@crates/vm/src/lib.rs`:
- Around line 70-71: The import of the ospath module in
crates/vm/src/stdlib/io.rs must be conditionally compiled to match the module
declaration (pub mod ospath) which is #[cfg(feature = "host_env")] in lib.rs;
locate the import statement that references ospath (the use/mod of ospath around
line ~5059 in stdlib/io.rs) and prefix it with #[cfg(feature = "host_env")] so
the import is only compiled when the host_env feature is enabled.

In `@crates/vm/src/stdlib/mod.rs`:
- Around line 101-104: The two cfg attributes both enable posix::module_def(ctx)
on WASI causing a duplicate; update the second attribute so it excludes WASI
(e.g. change #[cfg(all(feature = "host_env", not(any(unix, windows)))] to
exclude target_os = "wasi", such as #[cfg(all(feature = "host_env",
not(any(unix, windows, target_os = "wasi"))))]) so posix::module_def(ctx) is
only added once.

In `@crates/vm/src/stdlib/sys.rs`:
- Around line 122-132: The readline implementation currently ignores the
OptionalArg<isize> _size; update TextIOBase::readline (function readline in
sys.rs) to explicitly reject non-default size values instead of ignoring them:
before reading from stdin, check if _size is Present (i.e., Some) and if so
return an appropriate VM error (e.g., a NotImplementedError or ValueError
created via VirtualMachine like vm.new_not_implemented or vm.new_value_error)
with a clear message such as "readline with size not supported in sandbox";
otherwise proceed with the existing fd check and stdin.read_line flow.
- Around line 108-120: The write method currently swallows IO errors; change the
write implementation in the Sys* struct's #[pymethod] fn write(&self, s:
PyStrRef, vm: &VirtualMachine) -> PyResult<usize> to propagate errors from
stdout/stderr by replacing the `let _ = ...` ignores with
`.write_all(bytes).map_err(|e| vm.new_os_error(e.to_string()))?` so any write
failures become Py OSError; likewise implement a #[pymethod] fn flush(&self, vm:
&VirtualMachine) -> PyResult<()> (add the vm parameter) and call the appropriate
stream's `flush().map_err(|e| vm.new_os_error(e.to_string()))?` instead of
ignoring errors, returning Ok(()) on success.

🧹 Nitpick comments (8)

src/lib.rs (1)
176-189: Sandbox fallback design is sound but loses OS error detail.

The sandbox path correctly keeps filesystem I/O in the binary (trusted host) while the VM only sees source strings. Two minor observations:

std::fs::read_to_string assumes UTF-8, which is the Python 3 default source encoding — fine for the common case.

Line 187: err.to_string() discards the original io::ErrorKind/errno. Consider preserving it if the VM's new_os_error supports it, though this is low priority for a sandbox fallback.
Optional: preserve errno in sandbox error path
-            Err(err) => Err(vm.new_os_error(err.to_string())),
+            Err(err) => {
+                let msg = format!("Can't open file '{}': {}", path, err);
+                Err(vm.new_os_error(msg))
+            }
extra_tests/snippets/sandbox_smoke.py (1)
19-21: Minor: sys.stderr is not tested for writability.

stdin/stdout are tested but stderr is mentioned in the docstring (line 4) and available via SandboxStdio. Consider adding assert sys.stderr.writable() for completeness.
Optional: add stderr assertion
 assert sys.stdout.writable()
 assert sys.stdin.readable()
+assert sys.stderr.writable()
 assert sys.stdout.fileno() == 1
crates/vm/src/vm/python_run.rs (2)
149-155: Read::read may short-read even on regular files.

file.read(&mut buf)? is not guaranteed to fill the 2-byte buffer in a single call. While exceedingly rare for regular files, Read::read_exact would be the robust choice here.
Suggested fix
-        use std::io::Read;
-        if file.read(&mut buf)? != 2 {
+        use std::io::Read;
+        if file.read_exact(&mut buf).is_err() {
109-112: Commented-out deprecation on run_script.

The #[deprecated] attribute is commented out. If this is intentional for now, consider adding a brief // TODO explaining when the deprecation should be activated. Otherwise it's easy for this to be forgotten.
crates/vm/src/exceptions.rs (4)
1307-1345: build() relies on expect() for infallible paths — verify they truly can't fail.

Lines 1338, 1341, and 1343 each call .expect("new_os_error usage error"). While py_new always returns Ok(...) for the current implementation and into_ref_with_type should succeed when using the same exc_type, slot_init is more complex and could theoretically fail (e.g., on a subclass that overrides __init__). If any of these paths fail, the panic will be hard to diagnose.

Consider whether slot_init could fail for exotic subclasses and, if so, whether returning a PyResult from build would be more appropriate. If these are truly infallible, a brief comment on why would help future maintainers.

1320-1335: On non-Windows, a dummy None is injected at position 3 (winerror slot) into the args vec.

When errno is Some, the args vector is always 5 elements: [errno, strerror, filename, winerror_or_none, filename2]. On non-Windows, winerror is vm.ctx.none(). This means slot_init receives 5 args on all platforms — which works because slot_init only processes args[3] inside a #[cfg(windows)] block. However, this couples the builder's arg layout to slot_init's conditional parsing.

This is fine as-is, but a brief comment noting that the 4th position is always None on non-Windows (and ignored by slot_init) would prevent confusion.

1375-1385: libc::strerror is not guaranteed thread-safe.

On Unix, libc::strerror may return a pointer to a shared static buffer (for unknown errno values). While most modern libc implementations use static string literals for known errno values, the general case is not thread-safe. Since the result is immediately copied via to_string_lossy().into_owned(), the risk is low, but it's worth noting.

If thread-safety is a concern, strerror_r (POSIX) or strerror_l would be safer alternatives, though they come with their own portability complications.

1926-1926: Minor: Cloning the atomic ref just to check for None.

exc.filename.to_owned() clones the PyObjectRef behind the atomic ref only to check is_none. Consider using a read/deref approach if available to avoid the unnecessary clone:
-let has_filename = exc.filename.to_owned().filter(|f| !vm.is_none(f)).is_some();
+let has_filename = exc.filename.deref().is_some_and(|f| !vm.is_none(f));
(Depends on whether deref() returns a suitable reference — adjust as needed for the PyAtomicRef API.)

ShaharNaveh

I like the idea a lot

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/vm/src/vm/mod.rs (1)

326-394: ⚠️ Potential issue | 🔴 Critical

Fix conflicting cfg conditions for make_stdio branches.

The host_env and stdio features are independent in Cargo.toml (neither depends on the other), but the three cfg branches are not mutually exclusive. When host_env is enabled and stdio is disabled, both the first branch (#[cfg(feature = "host_env")]) and the third branch (#[cfg(not(feature = "stdio"))]) match simultaneously, causing a duplicate make_stdio binding and a compile error.

Change the third condition from #[cfg(not(feature = "stdio"))] to #[cfg(all(not(feature = "host_env"), not(feature = "stdio")))] to ensure all three branches are truly mutually exclusive.

🤖 Fix all issues with AI agents

In `@crates/vm/src/stdlib/sys.rs`:
- Around line 126-141: The readline implementation in SandboxStdio::readline
treats the size parameter as a byte index and calls line.truncate(size as
usize), which can panic on UTF‑8 boundaries for non‑ASCII input; change the
truncation to operate on character count instead of raw bytes—compute the byte
boundary for the first N characters (size) or rebuild the string with
line.chars().take(size as usize).collect(), then return that safely truncated
String while keeping the existing checks (handle size <= 0 and OptionalArg
unwrap as before).

🧹 Nitpick comments (1)

crates/vm/src/exceptions.rs (1)

1375-1385: libc::strerror is not guaranteed thread-safe on all platforms.

On some platforms, strerror returns a pointer to a shared static buffer. While glibc uses thread-local storage, POSIX doesn't guarantee thread-safety. The immediate .to_string_lossy().into_owned() conversion minimizes the risk window, but strerror_r (or Rust's std::io::Error::from_raw_os_error(errno).to_string()) would be a safer alternative.

Introduce a `host_env` feature flag that gates all host environment access (filesystem, network, signals, processes). When disabled, the VM operates in sandbox mode: - _io module always available; FileIO gated by host_env - SandboxStdio provides lightweight stdin/stdout/stderr via Rust std::io - BytesIO/StringIO/BufferedIO/TextIOWrapper work without host_env - open() returns UnsupportedOperation in sandbox - stdlib modules (os, socket, signal, etc.) gated by host_env - CI checks both host_env ON and OFF builds

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @.github/workflows/ci.yaml:
- Around line 156-167: Update the CI job to build with the compiler feature and
run sandbox snippets using cargo run instead of invoking the binary directly:
add the compiler feature to the build line (the step that currently runs cargo
build --no-default-features --features
stdlib,importlib,stdio,encodings,freeze-stdlib) so it becomes consistent with
the check steps, and replace the two direct binary executions
(target/debug/rustpython extra_tests/snippets/sandbox_smoke.py and
target/debug/rustpython extra_tests/snippets/stdlib_re.py) in the "sandbox smoke
test" step with cargo run --bin rustpython -- <snippet-path> (run in debug mode)
to follow the extra_tests/snippets guideline and ensure faster, reproducible
builds.

In `@crates/vm/src/exceptions.rs`:
- Around line 1354-1398: The impl of ToOSErrorBuilder for std::io::Error
currently always calls OSErrorBuilder::with_errno(errno, msg, vm) even when
posix_errno() returns 0; change to check the errno returned by posix_errno() and
only call with_errno when errno > 0, otherwise call
OSErrorBuilder::with_subtype(py_exception_type_for_os_error, None, msg, vm)
(preserving the msg) so we don't emit OSError(errno=0); update the
to_os_error_builder function around the creation of builder (the let mut builder
= ... and subsequent #[cfg(windows)] winerror handling) to branch on errno
accordingly while keeping the winerror attachment when available.

🧹 Nitpick comments (1)

crates/vm/src/signal.rs (1)
82-83: Nit: inconsistent #[cfg] style with set_interrupt_ex below.

Lines 82–83 use two stacked #[cfg(...)] attributes, while line 103 combines them into a single #[cfg(all(...))]. Both are semantically equivalent, but using a consistent style improves readability.
♻️ Suggested consolidation
-#[cfg(unix)]
-#[cfg(feature = "host_env")]
+#[cfg(all(unix, feature = "host_env"))]
 pub(crate) fn clear_after_fork() {

youknowone force-pushed the sandbox branch 2 times, most recently from c1c187b to 5069248 Compare February 8, 2026 01:44

youknowone commented Feb 8, 2026

View reviewed changes

youknowone force-pushed the sandbox branch from 5069248 to 2e4d15b Compare February 8, 2026 07:06

youknowone marked this pull request as ready for review February 8, 2026 07:11

youknowone changed the title ~~sandboxing~~ Feb 8, 2026

youknowone requested review from ShaharNaveh, coolreader18 and fanninpm February 8, 2026 07:12

coderabbitai Bot reviewed Feb 8, 2026

View reviewed changes

ShaharNaveh approved these changes Feb 8, 2026

View reviewed changes

youknowone enabled auto-merge (squash) February 8, 2026 11:41

youknowone force-pushed the sandbox branch from 1b9fa6d to 3fcc809 Compare February 8, 2026 11:47

coderabbitai Bot reviewed Feb 8, 2026

View reviewed changes

youknowone force-pushed the sandbox branch from c5db67a to dd10bf2 Compare February 8, 2026 12:52

Auto-format: ruff check --select I --fix

aae539a

coderabbitai Bot reviewed Feb 8, 2026

View reviewed changes

youknowone deleted the sandbox branch February 8, 2026 13:58

This was referenced Feb 8, 2026

Fix test_import: import machinery, circular imports, and script shadowing #7034

Merged

Update hashlib from v3.14.3 and align _hashlib to CPython #7120

Merged

coderabbitai Bot mentioned this pull request Feb 21, 2026

StatelessIncrementalEncoder, sys attributes #7195

Merged

coderabbitai Bot mentioned this pull request Mar 10, 2026

Rename stdlib files to match Python module names #7397

Merged

This was referenced Apr 21, 2026

Add minimal capi lifecycle support #7648

Merged

more hostenv impl #7604

Merged

coderabbitai Bot mentioned this pull request Apr 30, 2026

Use cfg_select in a bunch more places #7740

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial sandbox #7035