More fork safety#7380

youknowone

Summary by CodeRabbit

New Features
- Thread API: ThreadHandle adds ident, is_done, _set_done and join(timeout); thread-start APIs accept consolidated arg form.
- Mutexes: lock wait points can run a user-provided wrapper while waiting.
Bug Fixes
- Reduced spurious signal handling and tightened stop-the-world/shutdown synchronization to avoid lost wakeups and races.
- Improved fork-time warnings when forking from multi-threaded processes.
Performance
- Blocking I/O and OS waits now release the interpreter lock for better concurrency.

coderabbitai

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a lock_wrapped API and uses it with vm.allow_threads to release the VM/GIL during blocking waits; refactors stop-the-world state to world_stopped plus a thread_countdown, adds per-thread stop_requested, tightens signal-exception gating, and overhauls thread start/join/shutdown lifecycles.

Changes

Cohort / File(s)	Summary
Lock wrapping `crates/common/src/lock/thread_mutex.rs`	Added `lock_wrapped` to `RawThreadMutex` and `ThreadMutex` to run a user-provided wrapper (e.g., `vm.allow_threads`) around the blocking lock operation.
GIL-release for OS waits & multiplexing `crates/stdlib/src/multiprocessing.rs`, `crates/stdlib/src/select.rs`	Blocking OS calls (semaphores, `select`, `poll`, `epoll`) now execute inside `vm.allow_threads` closures so the VM/GIL is released during OS-level waits.
I/O locking and blocking behavior `crates/vm/src/stdlib/io.rs`	Replaced direct mutex locks with `lock_wrapped(
Signal handling `crates/vm/src/frame.rs`, `crates/vm/src/signal.rs`	Signal-exception handling in ExecutingFrame::run is now gated by `eval_breaker_tripped()`; added `pub(crate) fn is_triggered()` and updated the signal-exception handler signature.
Stop-the-world redesign `crates/vm/src/vm/mod.rs`, `crates/vm/src/vm/thread.rs`	Replaced detach-yields with `world_stopped: AtomicBool` and `thread_countdown: AtomicI64`; added per-thread `stop_requested` bit, countdown helpers, requester/notify helpers, and simplified attach/detach/suspend/resume flows; changed `detach_thread` signature.
Thread API & lifecycle overhaul `crates/vm/src/stdlib/thread.rs`	Refactored `start_new_thread`/`start_joinable_thread` to accept `FuncArgs`, added strict arg validation and audit hook calls, added `ThreadHandle` methods (`ident`, `is_done`, `_set_done`, `join`) and join helpers; reworked shutdown and join semantics.
Posix fork/thread counting `crates/vm/src/stdlib/posix.rs`	Added OS-thread-count heuristics (macOS/Linux/fallback) and refactored fork warning logic to use OS data when available and warn after parent resumes.

Sequence Diagram(s)

sequenceDiagram
    participant Py as Python code (caller)
    participant VM as VirtualMachine
    participant Mutex as ThreadMutex/RawThreadMutex
    participant OS as OS blocking syscall

    Py->>VM: request lock_wrapped(wrap_fn)
    VM->>Mutex: lock_wrapped(wrap_fn)
    Mutex->>VM: invoke wrap_fn(do_lock)
    VM->>OS: vm.allow_threads(do_lock)  -- releases VM/GIL
    OS-->>Mutex: blocking wait returns
    Mutex-->>VM: set owner and return
    VM-->>Py: return guard / success

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~110 minutes

Possibly related PRs

more algorithm-independent GC infra #7194: Related GC finalization changes that interact with interpreter shutdown and thread lifecycle.
Suspend Python threads before fork() #7364: Overlaps stop-the-world and lock/fork coordination changes; touches similar locking and reinit paths.
Update test_sys from v3.14.3 and impl more sys module #7075: Modifies ExecutingFrame::run and signal/exception handling, directly related to frame/signal edits here.

Suggested reviewers

ShaharNaveh
fanninpm

Poem

🐇
I nudged the lock and let it sleep,
While other threads could dance and leap.
A breaker hums, a countdown peeps,
I boxed the wait and freed the heap,
Hooray — concurrency’s tiny beep!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'More fork safety' is vague and generic, using non-descriptive language that doesn't clearly convey the specific changes made in this comprehensive pull request.	Consider using a more descriptive title that captures the main focus, such as 'Add lock_wrapped for GIL-safe mutex operations' or 'Improve thread safety in fork and stop-the-world logic'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[ ] lib: cpython/Lib/asyncio
[ ] test: cpython/Lib/test/test_asyncio (TODO: 37)

dependencies:

asyncio (native: _asyncio, _overlapped, _pyrepl.console, _pyrepl.main, _pyrepl.simple_interact, _remote_debugging, _winapi, asyncio.tools, base_events, collections.abc, concurrent.futures, coroutines, errno, events, exceptions, futures, graph, itertools, locks, log, math, msvcrt, protocols, queues, readline, runners, streams, sys, taskgroups, tasks, threads, time, timeouts, transports, unix_events, windows_events)
- collections (native: _collections, _weakref, itertools, sys)
- logging (native: atexit, collections.abc, email.message, email.utils, errno, http.client, logging.handlers, multiprocessing.queues, select, sys, time, urllib.parse, win32evtlog, win32evtlogutil)
- site (native: _io, _pyrepl.main, _pyrepl.pager, _pyrepl.readline, _pyrepl.unix_console, _pyrepl.windows_console, atexit, builtins, errno, readline, sitecustomize, sys, usercustomize)
- tokenize (native: _tokenize, builtins, itertools, sys)
- _colorize, argparse, ast, contextlib, contextvars, dataclasses, enum, functools, heapq, inspect, io, linecache, os, reprlib, rlcompleter, selectors, signal, socket, ssl, stat, struct, subprocess, tempfile, threading, traceback, types, warnings, weakref

dependent tests: (7 tests)

asyncio: test_asyncio test_contextlib_async test_inspect test_logging test_os test_sys_settrace test_unittest

[x] lib: cpython/Lib/threading.py
[x] lib: cpython/Lib/_threading_local.py
[ ] test: cpython/Lib/test/test_threading.py (TODO: 16)
[ ] test: cpython/Lib/test/test_threadedtempfile.py
[ ] test: cpython/Lib/test/test_threading_local.py (TODO: 3)

dependencies:

threading

dependent tests: (148 tests)

threading: test_android test_asyncio test_bz2 test_code test_concurrent_futures test_contextlib test_ctypes test_decimal test_docxmlrpc test_email test_enum test_fork1 test_frame test_ftplib test_functools test_gc test_hashlib test_httplib test_httpservers test_imaplib test_importlib test_inspect test_io test_itertools test_largefile test_linecache test_logging test_memoryview test_opcache test_pathlib test_poll test_queue test_robotparser test_sched test_signal test_smtplib test_socket test_socketserver test_sqlite3 test_ssl test_subprocess test_super test_sys test_syslog test_termios test_threadedtempfile test_threading test_threading_local test_time test_urllib2_localnet test_weakref test_winreg test_wsgiref test_xmlrpc test_zstd
- asyncio: test_asyncio test_contextlib_async test_os test_sys_settrace test_unittest
- bdb: test_bdb
- concurrent.futures._base: test_concurrent_futures
- concurrent.futures.process: test_compileall test_concurrent_futures
- concurrent.futures.thread: test_genericalias
- dummy_threading: test_dummy_threading
- http.cookiejar: test_http_cookiejar test_urllib2
  - urllib.request: test_pathlib test_pydoc test_sax test_site test_urllib test_urllib2net test_urllibnet
- importlib.util: test_ctypes test_doctest test_importlib test_pkgutil test_py_compile test_reprlib test_runpy test_zipfile test_zipimport
  - py_compile: test_argparse test_cmd_line_script test_importlib test_multiprocessing_main_handling
  - pyclbr: test_pyclbr
  - sysconfig: test_c_locale_coercion test_dtrace test_launcher test_osx_env test_posix test_pyexpat test_regrtest test_support test_sysconfig test_tools test_venv
  - zipfile: test_shutil test_zipapp test_zipfile test_zipfile64
- logging: test_unittest
  - hashlib: test_hmac test_tarfile test_unicodedata
- multiprocessing: test_fcntl test_re
- queue: test_dummy_thread
- subprocess: test_atexit test_audit test_cmd_line test_ctypes test_faulthandler test_file_eintr test_gzip test_json test_msvcrt test_ntpath test_platform test_plistlib test_quopri test_repl test_script_helper test_select test_tempfile test_traceback test_unittest test_utf8_mode test_wait3 test_webbrowser
  - ctypes.util: test_ctypes
  - ensurepip: test_ensurepip
  - platform: test__locale test__osx_support test_baseexception test_builtin test_cmath test_math test_mimetypes
- sysconfig:
  - trace: test_trace
- zipfile:
  - shutil: test_filecmp test_glob test_string_literals test_unicode_file test_zoneinfo

Legend:

[+] path exists in CPython
[x] up-to-date, [ ] outdated

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/vm/src/vm/thread.rs (1)
187-209: ⚠️ Potential issue | 🔴 Critical

Detaching a requested thread needs to wake the stop-the-world requester.

When park_detached_threads() marks an ATTACHED thread with stop_requested = true, a subsequent call to detach_thread() (via allow_threads()) can transition it to THREAD_DETACHED without notifying the requester. This leaves the thread countdown outstanding. The blocked stop_the_world() loop will only advance when either the countdown reaches zero or park_detached_threads() is called again to rescan the detached thread, causing unnecessary stalling.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/thread.rs` around lines 187 - 209, detaching an ATTACHED
thread can race with a prior park_detached_threads() that set
stop_requested=true and so detach_thread() must notify the stop-the-world
requester; update detach_thread() (inside CURRENT_THREAD_SLOT handling when
compare_exchange succeeds going from THREAD_ATTACHED to THREAD_DETACHED) to
check the thread's stop_requested flag and, if true, call the same notification
used by park_detached_threads()/stop_the_world (i.e., invoke the stop-the-world
wake/notify function used elsewhere) so the stop_the_world() countdown/loop is
advanced when a requested thread is detached.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/stdlib/io.rs`:
- Line 5339: The code currently uses std::io::Read::read_to_end(), which hides
EINTR and prevents vm.check_signals() from running on the unbounded-read path
(size < 0); replace that call with an explicit chunked loop that uses
crt_fd::read() to fill a Vec<u8> (read fixed-size chunks, e.g., 8KiB), append
each successful chunk to the buffer, break on EOF (0 bytes), and on read errors
return Err so EINTR is propagated to the outer error handler; make sure to
reference and replace the read_to_end() call site and keep vm.check_signals()
semantics by letting crt_fd::read() surface EINTR instead of swallowing it.

In `@crates/vm/src/vm/mod.rs`:
- Around line 334-343: The fast-return path when initial_countdown == 0 leaves
stats_last_wait_ns unchanged; update the code in the branch (inside the block
after self.world_stopped.store(...) and before returning) to reset the VM's
last-wait statistic (the atomic backing field used by
stats_last_wait_ns/stats_snapshot) to zero using the same atomic semantics used
elsewhere (e.g., an atomic store with appropriate Ordering), so stats_snapshot()
does not report the previous stop's wait time after a no-wait stop; locate the
change around init_thread_countdown and world_stopped in stop() in vm::mod.rs.
- Around line 1931-1949: The fast-path in eval_breaker_tripped() performs a
Relaxed load of state.finalizing which can bypass the later Acquire load in
check_signals(), breaking the Release→Acquire synchronization from the shutdown
write; change the atomic load on state.finalizing in eval_breaker_tripped() to
Ordering::Acquire (or otherwise ensure an Acquire-load or appropriate fence is
executed when checking finalizing) so that the shutdown write in interpreter.rs
(Release) becomes visible on other threads; references: eval_breaker_tripped,
state.finalizing, check_signals, and the shutdown write in interpreter.rs.

In `@crates/vm/src/vm/thread.rs`:
- Around line 426-462: The cleanup_current_thread_frames path can call
vm.state.stop_the_world.notify_thread_gone() for threads that were
THREAD_DETACHED before the stop-the-world began; only decrement the requester
countdown for threads that were counted by init_thread_countdown. Update the
final conditional in cleanup_current_thread_frames to also ensure the removed
slot's state is not THREAD_DETACHED (e.g. add slot.state.load(...) !=
THREAD_DETACHED alongside the existing THREAD_SUSPENDED check) before calling
stop_the_world.notify_thread_gone(); reference symbols:
cleanup_current_thread_frames, vm.state.stop_the_world.notify_thread_gone,
init_thread_countdown, THREAD_DETACHED, THREAD_SUSPENDED, slot.state.
- Around line 247-252: Reads of the stop-request bit in suspend_if_needed and
stop_requested_for_current_thread currently use Ordering::Relaxed; change those
.load(Ordering::Relaxed) calls on the stop_requested AtomicBool to
.load(Ordering::Acquire) so they synchronize with park_detached_threads() which
publishes the bit with Ordering::Release (ensuring eval_breaker_tripped() sees
the update and threads suspend correctly).

---

Outside diff comments:
In `@crates/vm/src/vm/thread.rs`:
- Around line 187-209: detaching an ATTACHED thread can race with a prior
park_detached_threads() that set stop_requested=true and so detach_thread() must
notify the stop-the-world requester; update detach_thread() (inside
CURRENT_THREAD_SLOT handling when compare_exchange succeeds going from
THREAD_ATTACHED to THREAD_DETACHED) to check the thread's stop_requested flag
and, if true, call the same notification used by
park_detached_threads()/stop_the_world (i.e., invoke the stop-the-world
wake/notify function used elsewhere) so the stop_the_world() countdown/loop is
advanced when a requested thread is detached.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 9daf3542-35f7-4fa9-8ecd-777f0f7c9a5d

📥 Commits

Reviewing files that changed from the base of the PR and between f680f8a and b29c3b8.

⛔ Files ignored due to path filters (2)

Lib/test/test_asyncio/test_events.py is excluded by !Lib/**
Lib/test/test_threading.py is excluded by !Lib/**

📒 Files selected for processing (9)

crates/common/src/lock/thread_mutex.rs
crates/stdlib/src/multiprocessing.rs
crates/stdlib/src/select.rs
crates/vm/src/frame.rs
crates/vm/src/signal.rs
crates/vm/src/stdlib/io.rs
crates/vm/src/stdlib/thread.rs
crates/vm/src/vm/mod.rs
crates/vm/src/vm/thread.rs

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/stdlib/thread.rs`:
- Around line 1210-1247: The join_internal helper currently enforces public join
validations (calls check_started and checks VM finalization/current-ident) which
breaks callers from _shutdown; update join_internal to accept an internal flag
(e.g., for_shutdown: bool) or a separate private helper so _shutdown can bypass
public-only checks: remove or guard the check_started(inner, vm) call and the
finalization/current-thread checks inside join_internal (references:
join_internal, check_started, _shutdown, ThreadHandle_join, get_ident,
ThreadHandleState::Running, vm.state.finalizing) and ensure the public join()
wrapper still performs check_started and finalization checks before delegating
to join_internal(for_shutdown=false) while _shutdown calls
join_internal(for_shutdown=true) to skip those validations.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 300d9b9c-9e25-44bf-b7b6-e9cd4c933f3a

📥 Commits

Reviewing files that changed from the base of the PR and between b29c3b8 and dc18133.

📒 Files selected for processing (1)

crates/vm/src/stdlib/thread.rs

coderabbitai

♻️ Duplicate comments (1)

crates/vm/src/stdlib/thread.rs (1)

1210-1247: ⚠️ Potential issue | 🔴 Critical

Keep _shutdown() off the public join() validation path.

join_internal() still runs check_started() and the finalization guard even when _shutdown() is the caller. That means shutdown can legitimately hit Starting / NotStarted handles from the registry, or trip the finalizing check, and then _shutdown() logs the exception and returns early instead of waiting for the remaining non-daemon threads. The public checks need to stay in join(), or join_internal() needs an internal mode that skips them for shutdown/fork cleanup callers.

Possible direction

-        fn join_internal(
+        fn join_internal(
             inner: &Arc<parking_lot::Mutex<ThreadHandleInner>>,
             done_event: &Arc<(parking_lot::Mutex<bool>, parking_lot::Condvar)>,
             timeout_duration: Option<Duration>,
+            public_checks: bool,
             vm: &VirtualMachine,
         ) -> PyResult<()> {
-            Self::check_started(inner, vm)?;
+            if public_checks {
+                Self::check_started(inner, vm)?;
+            }

             let deadline =
                 timeout_duration.and_then(|timeout| std::time::Instant::now().checked_add(timeout));

             let (lock, cvar) = &**done_event;
             let mut done = lock.lock();

-            if !*done {
+            if public_checks && !*done {
                 let inner_guard = inner.lock();
                 let current_ident = get_ident();
                 if inner_guard.ident == current_ident
                     && inner_guard.state == ThreadHandleState::Running
                 {
                     return Err(vm.new_runtime_error("Cannot join current thread"));
                 }
                 if vm.state.finalizing.load(core::sync::atomic::Ordering::Acquire) {
                     return Err(...);
                 }
             }

-            Self::join_internal(&self.inner, &self.done_event, timeout_duration, vm)
+            Self::join_internal(&self.inner, &self.done_event, timeout_duration, true, vm)

-                    if let Err(exc) = ThreadHandle::join_internal(&inner, &done_event, None, vm) {
+                    if let Err(exc) =
+                        ThreadHandle::join_internal(&inner, &done_event, None, false, vm)
+                    {

Also applies to: 1314-1325

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/thread.rs` around lines 1210 - 1247, The join_internal()
path runs public validations (check_started() and the vm.state.finalizing guard)
even when invoked from internal shutdown/fork cleanup, so change the call
boundary: move the public checks back into the public join() method and/or add
an internal-only mode to join_internal() (e.g., add a bool flag like
skip_public_checks or is_shutdown_call to join_internal(inner, done_event,
timeout_duration, vm, skip_public_checks: bool)), have join() call
join_internal(..., false) and _shutdown()/fork cleanup call join_internal(...,
true), and inside join_internal() skip check_started() and the finalizing check
when skip_public_checks is true; update all callers (including _shutdown()) to
use the new flag and keep ThreadHandleState/ThreadHandle_join semantics intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/vm/src/stdlib/thread.rs`:
- Around line 1210-1247: The join_internal() path runs public validations
(check_started() and the vm.state.finalizing guard) even when invoked from
internal shutdown/fork cleanup, so change the call boundary: move the public
checks back into the public join() method and/or add an internal-only mode to
join_internal() (e.g., add a bool flag like skip_public_checks or
is_shutdown_call to join_internal(inner, done_event, timeout_duration, vm,
skip_public_checks: bool)), have join() call join_internal(..., false) and
_shutdown()/fork cleanup call join_internal(..., true), and inside
join_internal() skip check_started() and the finalizing check when
skip_public_checks is true; update all callers (including _shutdown()) to use
the new flag and keep ThreadHandleState/ThreadHandle_join semantics intact.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: de154942-d61b-4972-bdb4-43be2e906d06

📥 Commits

Reviewing files that changed from the base of the PR and between dc18133 and d36a0b1.

📒 Files selected for processing (1)

crates/vm/src/stdlib/thread.rs

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

crates/vm/src/stdlib/thread.rs (1)

1211-1249: ⚠️ Potential issue | 🔴 Critical

Keep shutdown waits off the public join() validation path.

join_internal() still rejects Starting handles at Line 1217 and still raises PythonFinalizationError at Lines 1237-1248. _shutdown() legitimately hits both states while draining non-daemon threads, so this helper can still abort shutdown early and leave threads unjoined. Move those checks into the public join() wrapper, or add an internal for_shutdown mode that skips them.

Refactor sketch

-        fn join_internal(
+        fn join_internal(
             inner: &Arc<parking_lot::Mutex<ThreadHandleInner>>,
             done_event: &Arc<(parking_lot::Mutex<bool>, parking_lot::Condvar)>,
             timeout_duration: Option<Duration>,
+            for_shutdown: bool,
             vm: &VirtualMachine,
         ) -> PyResult<()> {
-            Self::check_started(inner, vm)?;
+            if !for_shutdown {
+                Self::check_started(inner, vm)?;
+            }

             let deadline =
                 timeout_duration.and_then(|timeout| std::time::Instant::now().checked_add(timeout));

             let (lock, cvar) = &**done_event;
             let mut done = lock.lock();

-            if !*done {
+            if !*done && !for_shutdown {
                 let inner_guard = inner.lock();
                 let current_ident = get_ident();
                 if inner_guard.ident == current_ident
                     && inner_guard.state == ThreadHandleState::Running
                 {
@@
-            Self::join_internal(&self.inner, &self.done_event, timeout_duration, vm)
+            Self::check_started(&self.inner, vm)?;
+            Self::join_internal(&self.inner, &self.done_event, timeout_duration, false, vm)

-                    if let Err(exc) = ThreadHandle::join_internal(&inner, &done_event, None, vm) {
+                    if let Err(exc) =
+                        ThreadHandle::join_internal(&inner, &done_event, None, true, vm)
+                    {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/thread.rs` around lines 1211 - 1249, join_internal
currently enforces "Starting" and interpreter-finalizing checks (via
Self::check_started and the PythonFinalizationError branch) which causes
_shutdown to abort while draining threads; modify join_internal to accept a
for_shutdown: bool parameter (or remove those checks here) and move the
Starting-state verification and the vm.state.finalizing check into the public
join() wrapper so normal joins still validate but calls from _shutdown can call
join_internal(..., for_shutdown=true) (or call a new internal no-validate
helper) to skip those two checks; update all callers (public join() and
_shutdown) to call the appropriate variant and ensure unique symbols referenced:
join_internal, check_started, ThreadHandle_join semantic check, and _shutdown
are adjusted accordingly.

crates/vm/src/vm/mod.rs (2)

1932-1937: ⚠️ Potential issue | 🔴 Critical

Use Acquire for the finalization gate in eval_breaker_tripped().

Line 1935 still does a Relaxed read even though this branch decides whether check_signals() runs at all. A stale false here skips the later Acquire load in check_signals(), so non-main threads can miss finalization on weak memory models.

Proposed fix

         #[cfg(feature = "threading")]
-        if self.state.finalizing.load(Ordering::Relaxed) && !self.is_main_thread() {
+        if self.state.finalizing.load(Ordering::Acquire) && !self.is_main_thread() {
             return true;
         }

#!/bin/bash
set -euo pipefail

printf '\n== eval_breaker_tripped ==\n'
sed -n '1932,1958p' crates/vm/src/vm/mod.rs

printf '\n== bytecode-loop gate ==\n'
sed -n '1198,1206p' crates/vm/src/frame.rs

printf '\n== finalizing loads/stores ==\n'
rg -n -C2 'finalizing\.(load|store)\(' crates/vm/src

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 1932 - 1937, The read of the
finalization gate in eval_breaker_tripped() uses Ordering::Relaxed which can let
a stale false skip check_signals() on weak memory models; change the load to use
Ordering::Acquire (i.e., replace state.finalizing.load(Ordering::Relaxed) with
state.finalizing.load(Ordering::Acquire)) so that the subsequent
check_signals()'s synchronization is not bypassed, leaving all other logic in
eval_breaker_tripped() and callers like check_signals() unchanged.

332-339: ⚠️ Potential issue | 🟡 Minor

Reset stats_last_wait_ns on the zero-thread fast path.

When Line 332 returns immediately, stats_snapshot() keeps reporting the previous stop's wait time even though this stop waited 0ns. Clear stats_last_wait_ns before returning.

Proposed fix

         if initial_countdown == 0 {
+            self.stats_last_wait_ns.store(0, Ordering::Relaxed);
             self.world_stopped.store(true, Ordering::Release);
             #[cfg(debug_assertions)]
             self.debug_assert_all_non_requester_suspended(vm);
             stw_trace(format_args!(
                 "stop end requester={requester_ident} wait_ns=0 polls=0"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 332 - 339, The zero-thread fast path
returns without resetting the last-stop duration, so stats_snapshot() reports
stale wait time; in the initial_countdown == 0 branch reset
self.stats_last_wait_ns to zero (use the same atomic store pattern and Ordering
used elsewhere in this module) before the stw_trace/return (after
debug_assert_all_non_requester_suspended(vm) if present) so the reported wait_ns
is correctly 0 for this stop.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/stdlib/posix.rs`:
- Around line 994-997: The code currently drops the result of
crate::stdlib::warnings::warn (using let _ = ...) which prevents
warning-as-error filters from taking effect; change the call site to propagate
the PyResult by removing the discard and using the try operator (i.e. return the
warn() result), change warn_if_multi_threaded to return PyResult<()> (update its
signature and return points) and propagate that result at each call site (e.g.,
where warn_if_multi_threaded("fork", num_os_threads, vm) is invoked), and update
the surrounding comment to state that CPython propagates warning errors rather
than silently ignoring them.

---

Duplicate comments:
In `@crates/vm/src/stdlib/thread.rs`:
- Around line 1211-1249: join_internal currently enforces "Starting" and
interpreter-finalizing checks (via Self::check_started and the
PythonFinalizationError branch) which causes _shutdown to abort while draining
threads; modify join_internal to accept a for_shutdown: bool parameter (or
remove those checks here) and move the Starting-state verification and the
vm.state.finalizing check into the public join() wrapper so normal joins still
validate but calls from _shutdown can call join_internal(..., for_shutdown=true)
(or call a new internal no-validate helper) to skip those two checks; update all
callers (public join() and _shutdown) to call the appropriate variant and ensure
unique symbols referenced: join_internal, check_started, ThreadHandle_join
semantic check, and _shutdown are adjusted accordingly.

In `@crates/vm/src/vm/mod.rs`:
- Around line 1932-1937: The read of the finalization gate in
eval_breaker_tripped() uses Ordering::Relaxed which can let a stale false skip
check_signals() on weak memory models; change the load to use Ordering::Acquire
(i.e., replace state.finalizing.load(Ordering::Relaxed) with
state.finalizing.load(Ordering::Acquire)) so that the subsequent
check_signals()'s synchronization is not bypassed, leaving all other logic in
eval_breaker_tripped() and callers like check_signals() unchanged.
- Around line 332-339: The zero-thread fast path returns without resetting the
last-stop duration, so stats_snapshot() reports stale wait time; in the
initial_countdown == 0 branch reset self.stats_last_wait_ns to zero (use the
same atomic store pattern and Ordering used elsewhere in this module) before the
stw_trace/return (after debug_assert_all_non_requester_suspended(vm) if present)
so the reported wait_ns is correctly 0 for this stop.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: a2aa3b12-c448-410a-8e2d-4b744c66252d

📥 Commits

Reviewing files that changed from the base of the PR and between d36a0b1 and 026eb24.

📒 Files selected for processing (3)

crates/vm/src/stdlib/posix.rs
crates/vm/src/stdlib/thread.rs
crates/vm/src/vm/mod.rs

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

crates/vm/src/stdlib/posix.rs (1)

904-923: macOS: Calling task_for_pid on self is unnecessary overhead.

Using task_for_pid(mach_task_self(), getpid(), &mut proc_task) is redundant since mach_task_self() already returns the task port for the current process. Additionally, task_for_pid typically requires special entitlements (taskgated) when used for other processes, and while it works for self, it's an extra syscall.

♻️ Suggested simplification

-            let self_task = unsafe { mach_task_self() };
-            let mut proc_task: MachPortT = 0;
-            if unsafe { task_for_pid(self_task, libc::getpid(), &mut proc_task) } == KERN_SUCCESS {
-                let mut threads: ThreadActArrayT = core::ptr::null_mut();
-                let mut n_threads: MachMsgTypeNumberT = 0;
-                if unsafe { task_threads(proc_task, &mut threads, &mut n_threads) } == KERN_SUCCESS
+            let self_task = unsafe { mach_task_self() };
+            let mut threads: ThreadActArrayT = core::ptr::null_mut();
+            let mut n_threads: MachMsgTypeNumberT = 0;
+            if unsafe { task_threads(self_task, &mut threads, &mut n_threads) } == KERN_SUCCESS
+            {
+                if !threads.is_null() {
+                    let _ = unsafe {
+                        vm_deallocate(
+                            self_task,
+                            threads as libc::uintptr_t,
+                            (n_threads as usize * core::mem::size_of::<MachPortT>())
+                                as libc::uintptr_t,
+                        )
+                    };
+                }
+                return n_threads as isize;
+            }
+            0

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/posix.rs` around lines 904 - 923, The code redundantly
calls task_for_pid(self_task, getpid(), &mut proc_task); replace that flow by
assigning proc_task = self_task (from mach_task_self()) and then call
task_threads(proc_task, &mut threads, &mut n_threads) as before; keep the
existing vm_deallocate call using self_task and threads, and return n_threads as
isize, removing the task_for_pid branch and its associated conditional while
preserving error handling around task_threads and vm_deallocate (symbols:
mach_task_self, proc_task, task_for_pid, task_threads, vm_deallocate,
n_threads).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/stdlib/posix.rs`:
- Around line 926-946: The current parsing of /proc/self/stat is fragile because
split_whitespace() will break when the comm field (inside parentheses) contains
spaces; instead, after reading the file and converting to string, find the index
of the last ')' in the line, take the substring after that character, then
split_whitespace() on that remainder and parse the desired field (the same field
you previously accessed with nth(19)) from the resultant iterator (preserve the
existing return behavior using parse::<isize>().unwrap_or(0)); replace the
direct line.split_whitespace().nth(19) usage with this "find last ')', take
substring after it, then split" approach to reliably locate the field even when
comm contains spaces.

---

Nitpick comments:
In `@crates/vm/src/stdlib/posix.rs`:
- Around line 904-923: The code redundantly calls task_for_pid(self_task,
getpid(), &mut proc_task); replace that flow by assigning proc_task = self_task
(from mach_task_self()) and then call task_threads(proc_task, &mut threads, &mut
n_threads) as before; keep the existing vm_deallocate call using self_task and
threads, and return n_threads as isize, removing the task_for_pid branch and its
associated conditional while preserving error handling around task_threads and
vm_deallocate (symbols: mach_task_self, proc_task, task_for_pid, task_threads,
vm_deallocate, n_threads).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: d8154526-8adb-41ea-84aa-3bb0378c4963

📥 Commits

Reviewing files that changed from the base of the PR and between 026eb24 and 939a1ff.

📒 Files selected for processing (1)

crates/vm/src/stdlib/posix.rs

- Fix park_detached_threads: successful CAS no longer sets all_suspended=false, avoiding unnecessary polling rounds - Replace park_timeout(50µs) with park() in wait_while_suspended - Remove redundant self-suspension in attach_thread and detach_thread; the STW controller handles DETACHED→SUSPENDED via park_detached_threads - Add double-check under mutex before condvar wait to prevent lost wakes - Remove dead stats_detach_wait_yields field and add_detach_wait_yields

Like CPython's ThreadHandle_start, set RUNNING state in the parent thread immediately after spawn() succeeds, rather than in the child. This eliminates a race where join() could see Starting state if called before the child thread executes. Also reverts the macOS skip for test_start_new_thread_failed since the root cause is fixed.

…ad_at_finalization

Add lock_wrapped to ThreadMutex for detaching thread state while waiting on contended locks. Use it for buffered and text IO locks. Wrap FileIO read/write in allow_threads via crt_fd to prevent STW hangs on blocking file operations.

Replace parking_lot Mutex/Condvar with std::sync (pthread-based) for started_event and handle_ready_event. This prevents hangs in forked children where parking_lot's global HASHTABLE may be corrupted.

* apply more allow_threads * Simplify STW thread state transitions - Fix park_detached_threads: successful CAS no longer sets all_suspended=false, avoiding unnecessary polling rounds - Replace park_timeout(50µs) with park() in wait_while_suspended - Remove redundant self-suspension in attach_thread and detach_thread; the STW controller handles DETACHED→SUSPENDED via park_detached_threads - Add double-check under mutex before condvar wait to prevent lost wakes - Remove dead stats_detach_wait_yields field and add_detach_wait_yields * Representable for ThreadHandle * Set ThreadHandle state to Running in parent thread after spawn Like CPython's ThreadHandle_start, set RUNNING state in the parent thread immediately after spawn() succeeds, rather than in the child. This eliminates a race where join() could see Starting state if called before the child thread executes. Also reverts the macOS skip for test_start_new_thread_failed since the root cause is fixed. * Set ThreadHandle state to Running in parent thread after spawn * Add debug_assert for thread state in start_the_world * Unskip now-passing test_get_event_loop_thread and test_start_new_thread_at_finalization * Wrap IO locks and file ops in allow_threads Add lock_wrapped to ThreadMutex for detaching thread state while waiting on contended locks. Use it for buffered and text IO locks. Wrap FileIO read/write in allow_threads via crt_fd to prevent STW hangs on blocking file operations. * Use std::sync for thread start/ready events Replace parking_lot Mutex/Condvar with std::sync (pthread-based) for started_event and handle_ready_event. This prevents hangs in forked children where parking_lot's global HASHTABLE may be corrupted.

youknowone force-pushed the implock branch 3 times, most recently from db453fc to b29c3b8 Compare March 7, 2026 15:36

youknowone marked this pull request as ready for review March 7, 2026 16:28

coderabbitai Bot reviewed Mar 7, 2026

View reviewed changes

youknowone force-pushed the implock branch from dc18133 to d36a0b1 Compare March 7, 2026 17:45

coderabbitai Bot reviewed Mar 7, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 8, 2026

View reviewed changes

apply more allow_threads

6217ad6

Representable for ThreadHandle

396fec9

Set ThreadHandle state to Running in parent thread after spawn

e37b15f

Add debug_assert for thread state in start_the_world

cfd7f01

Unskip now-passing test_get_event_loop_thread and test_start_new_thre… …

1b97419

…ad_at_finalization

Wrap IO locks and file ops in allow_threads …

1ba2821

Add lock_wrapped to ThreadMutex for detaching thread state while waiting on contended locks. Use it for buffered and text IO locks. Wrap FileIO read/write in allow_threads via crt_fd to prevent STW hangs on blocking file operations.

Use std::sync for thread start/ready events …

ca41577

Replace parking_lot Mutex/Condvar with std::sync (pthread-based) for started_event and handle_ready_event. This prevents hangs in forked children where parking_lot's global HASHTABLE may be corrupted.

youknowone force-pushed the implock branch from 939a1ff to ca41577 Compare March 8, 2026 08:16

youknowone deleted the implock branch March 8, 2026 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More fork safety#7380