More fork safety#7380
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a lock_wrapped API and uses it with vm.allow_threads to release the VM/GIL during blocking waits; refactors stop-the-world state to world_stopped plus a thread_countdown, adds per-thread stop_requested, tightens signal-exception gating, and overhauls thread start/join/shutdown lifecycles. Changes
Sequence Diagram(s)sequenceDiagram
participant Py as Python code (caller)
participant VM as VirtualMachine
participant Mutex as ThreadMutex/RawThreadMutex
participant OS as OS blocking syscall
Py->>VM: request lock_wrapped(wrap_fn)
VM->>Mutex: lock_wrapped(wrap_fn)
Mutex->>VM: invoke wrap_fn(do_lock)
VM->>OS: vm.allow_threads(do_lock) -- releases VM/GIL
OS-->>Mutex: blocking wait returns
Mutex-->>VM: set owner and return
VM-->>Py: return guard / success
Estimated code review effort🎯 5 (Critical) | ⏱️ ~110 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Sorry, something went wrong.
db453fc to
b29c3b8
Compare
March 7, 2026 15:36
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [ ] lib: cpython/Lib/asyncio dependencies:
dependent tests: (7 tests)
[x] lib: cpython/Lib/threading.py dependencies:
dependent tests: (148 tests)
Legend:
|
Sorry, something went wrong.
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/vm/thread.rs (1)
187-209: ⚠️ Potential issue | 🔴 CriticalDetaching a requested thread needs to wake the stop-the-world requester.
When
park_detached_threads()marks an ATTACHED thread withstop_requested = true, a subsequent call todetach_thread()(viaallow_threads()) can transition it to THREAD_DETACHED without notifying the requester. This leaves the thread countdown outstanding. The blockedstop_the_world()loop will only advance when either the countdown reaches zero orpark_detached_threads()is called again to rescan the detached thread, causing unnecessary stalling.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/thread.rs` around lines 187 - 209, detaching an ATTACHED thread can race with a prior park_detached_threads() that set stop_requested=true and so detach_thread() must notify the stop-the-world requester; update detach_thread() (inside CURRENT_THREAD_SLOT handling when compare_exchange succeeds going from THREAD_ATTACHED to THREAD_DETACHED) to check the thread's stop_requested flag and, if true, call the same notification used by park_detached_threads()/stop_the_world (i.e., invoke the stop-the-world wake/notify function used elsewhere) so the stop_the_world() countdown/loop is advanced when a requested thread is detached.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/stdlib/io.rs`:
- Line 5339: The code currently uses std::io::Read::read_to_end(), which hides
EINTR and prevents vm.check_signals() from running on the unbounded-read path
(size < 0); replace that call with an explicit chunked loop that uses
crt_fd::read() to fill a Vec<u8> (read fixed-size chunks, e.g., 8KiB), append
each successful chunk to the buffer, break on EOF (0 bytes), and on read errors
return Err so EINTR is propagated to the outer error handler; make sure to
reference and replace the read_to_end() call site and keep vm.check_signals()
semantics by letting crt_fd::read() surface EINTR instead of swallowing it.
In `@crates/vm/src/vm/mod.rs`:
- Around line 334-343: The fast-return path when initial_countdown == 0 leaves
stats_last_wait_ns unchanged; update the code in the branch (inside the block
after self.world_stopped.store(...) and before returning) to reset the VM's
last-wait statistic (the atomic backing field used by
stats_last_wait_ns/stats_snapshot) to zero using the same atomic semantics used
elsewhere (e.g., an atomic store with appropriate Ordering), so stats_snapshot()
does not report the previous stop's wait time after a no-wait stop; locate the
change around init_thread_countdown and world_stopped in stop() in vm::mod.rs.
- Around line 1931-1949: The fast-path in eval_breaker_tripped() performs a
Relaxed load of state.finalizing which can bypass the later Acquire load in
check_signals(), breaking the Release→Acquire synchronization from the shutdown
write; change the atomic load on state.finalizing in eval_breaker_tripped() to
Ordering::Acquire (or otherwise ensure an Acquire-load or appropriate fence is
executed when checking finalizing) so that the shutdown write in interpreter.rs
(Release) becomes visible on other threads; references: eval_breaker_tripped,
state.finalizing, check_signals, and the shutdown write in interpreter.rs.
In `@crates/vm/src/vm/thread.rs`:
- Around line 426-462: The cleanup_current_thread_frames path can call
vm.state.stop_the_world.notify_thread_gone() for threads that were
THREAD_DETACHED before the stop-the-world began; only decrement the requester
countdown for threads that were counted by init_thread_countdown. Update the
final conditional in cleanup_current_thread_frames to also ensure the removed
slot's state is not THREAD_DETACHED (e.g. add slot.state.load(...) !=
THREAD_DETACHED alongside the existing THREAD_SUSPENDED check) before calling
stop_the_world.notify_thread_gone(); reference symbols:
cleanup_current_thread_frames, vm.state.stop_the_world.notify_thread_gone,
init_thread_countdown, THREAD_DETACHED, THREAD_SUSPENDED, slot.state.
- Around line 247-252: Reads of the stop-request bit in suspend_if_needed and
stop_requested_for_current_thread currently use Ordering::Relaxed; change those
.load(Ordering::Relaxed) calls on the stop_requested AtomicBool to
.load(Ordering::Acquire) so they synchronize with park_detached_threads() which
publishes the bit with Ordering::Release (ensuring eval_breaker_tripped() sees
the update and threads suspend correctly).
---
Outside diff comments:
In `@crates/vm/src/vm/thread.rs`:
- Around line 187-209: detaching an ATTACHED thread can race with a prior
park_detached_threads() that set stop_requested=true and so detach_thread() must
notify the stop-the-world requester; update detach_thread() (inside
CURRENT_THREAD_SLOT handling when compare_exchange succeeds going from
THREAD_ATTACHED to THREAD_DETACHED) to check the thread's stop_requested flag
and, if true, call the same notification used by
park_detached_threads()/stop_the_world (i.e., invoke the stop-the-world
wake/notify function used elsewhere) so the stop_the_world() countdown/loop is
advanced when a requested thread is detached.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 9daf3542-35f7-4fa9-8ecd-777f0f7c9a5d
⛔ Files ignored due to path filters (2)
Lib/test/test_asyncio/test_events.pyis excluded by!Lib/**Lib/test/test_threading.pyis excluded by!Lib/**
📒 Files selected for processing (9)
crates/common/src/lock/thread_mutex.rscrates/stdlib/src/multiprocessing.rscrates/stdlib/src/select.rscrates/vm/src/frame.rscrates/vm/src/signal.rscrates/vm/src/stdlib/io.rscrates/vm/src/stdlib/thread.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/thread.rs
Sorry, something went wrong.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/stdlib/thread.rs`:
- Around line 1210-1247: The join_internal helper currently enforces public join
validations (calls check_started and checks VM finalization/current-ident) which
breaks callers from _shutdown; update join_internal to accept an internal flag
(e.g., for_shutdown: bool) or a separate private helper so _shutdown can bypass
public-only checks: remove or guard the check_started(inner, vm) call and the
finalization/current-thread checks inside join_internal (references:
join_internal, check_started, _shutdown, ThreadHandle_join, get_ident,
ThreadHandleState::Running, vm.state.finalizing) and ensure the public join()
wrapper still performs check_started and finalization checks before delegating
to join_internal(for_shutdown=false) while _shutdown calls
join_internal(for_shutdown=true) to skip those validations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 300d9b9c-9e25-44bf-b7b6-e9cd4c933f3a
📒 Files selected for processing (1)
crates/vm/src/stdlib/thread.rs
Sorry, something went wrong.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/vm/src/stdlib/thread.rs (1)
1210-1247: ⚠️ Potential issue | 🔴 CriticalKeep
_shutdown()off the publicjoin()validation path.
join_internal()still runscheck_started()and the finalization guard even when_shutdown()is the caller. That means shutdown can legitimately hitStarting/NotStartedhandles from the registry, or trip thefinalizingcheck, and then_shutdown()logs the exception and returns early instead of waiting for the remaining non-daemon threads. The public checks need to stay injoin(), orjoin_internal()needs an internal mode that skips them for shutdown/fork cleanup callers.Possible direction
- fn join_internal( + fn join_internal( inner: &Arc<parking_lot::Mutex<ThreadHandleInner>>, done_event: &Arc<(parking_lot::Mutex<bool>, parking_lot::Condvar)>, timeout_duration: Option<Duration>, + public_checks: bool, vm: &VirtualMachine, ) -> PyResult<()> { - Self::check_started(inner, vm)?; + if public_checks { + Self::check_started(inner, vm)?; + } let deadline = timeout_duration.and_then(|timeout| std::time::Instant::now().checked_add(timeout)); let (lock, cvar) = &**done_event; let mut done = lock.lock(); - if !*done { + if public_checks && !*done { let inner_guard = inner.lock(); let current_ident = get_ident(); if inner_guard.ident == current_ident && inner_guard.state == ThreadHandleState::Running { return Err(vm.new_runtime_error("Cannot join current thread")); } if vm.state.finalizing.load(core::sync::atomic::Ordering::Acquire) { return Err(...); } }- Self::join_internal(&self.inner, &self.done_event, timeout_duration, vm) + Self::join_internal(&self.inner, &self.done_event, timeout_duration, true, vm)- if let Err(exc) = ThreadHandle::join_internal(&inner, &done_event, None, vm) { + if let Err(exc) = + ThreadHandle::join_internal(&inner, &done_event, None, false, vm) + {Also applies to: 1314-1325
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/thread.rs` around lines 1210 - 1247, The join_internal() path runs public validations (check_started() and the vm.state.finalizing guard) even when invoked from internal shutdown/fork cleanup, so change the call boundary: move the public checks back into the public join() method and/or add an internal-only mode to join_internal() (e.g., add a bool flag like skip_public_checks or is_shutdown_call to join_internal(inner, done_event, timeout_duration, vm, skip_public_checks: bool)), have join() call join_internal(..., false) and _shutdown()/fork cleanup call join_internal(..., true), and inside join_internal() skip check_started() and the finalizing check when skip_public_checks is true; update all callers (including _shutdown()) to use the new flag and keep ThreadHandleState/ThreadHandle_join semantics intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/vm/src/stdlib/thread.rs`:
- Around line 1210-1247: The join_internal() path runs public validations
(check_started() and the vm.state.finalizing guard) even when invoked from
internal shutdown/fork cleanup, so change the call boundary: move the public
checks back into the public join() method and/or add an internal-only mode to
join_internal() (e.g., add a bool flag like skip_public_checks or
is_shutdown_call to join_internal(inner, done_event, timeout_duration, vm,
skip_public_checks: bool)), have join() call join_internal(..., false) and
_shutdown()/fork cleanup call join_internal(..., true), and inside
join_internal() skip check_started() and the finalizing check when
skip_public_checks is true; update all callers (including _shutdown()) to use
the new flag and keep ThreadHandleState/ThreadHandle_join semantics intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: de154942-d61b-4972-bdb4-43be2e906d06
📒 Files selected for processing (1)
crates/vm/src/stdlib/thread.rs
Sorry, something went wrong.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
crates/vm/src/stdlib/thread.rs (1)
1211-1249: ⚠️ Potential issue | 🔴 CriticalKeep shutdown waits off the public
join()validation path.
join_internal()still rejectsStartinghandles at Line 1217 and still raisesPythonFinalizationErrorat Lines 1237-1248._shutdown()legitimately hits both states while draining non-daemon threads, so this helper can still abort shutdown early and leave threads unjoined. Move those checks into the publicjoin()wrapper, or add an internalfor_shutdownmode that skips them.Refactor sketch
- fn join_internal( + fn join_internal( inner: &Arc<parking_lot::Mutex<ThreadHandleInner>>, done_event: &Arc<(parking_lot::Mutex<bool>, parking_lot::Condvar)>, timeout_duration: Option<Duration>, + for_shutdown: bool, vm: &VirtualMachine, ) -> PyResult<()> { - Self::check_started(inner, vm)?; + if !for_shutdown { + Self::check_started(inner, vm)?; + } let deadline = timeout_duration.and_then(|timeout| std::time::Instant::now().checked_add(timeout)); let (lock, cvar) = &**done_event; let mut done = lock.lock(); - if !*done { + if !*done && !for_shutdown { let inner_guard = inner.lock(); let current_ident = get_ident(); if inner_guard.ident == current_ident && inner_guard.state == ThreadHandleState::Running { @@ - Self::join_internal(&self.inner, &self.done_event, timeout_duration, vm) + Self::check_started(&self.inner, vm)?; + Self::join_internal(&self.inner, &self.done_event, timeout_duration, false, vm)- if let Err(exc) = ThreadHandle::join_internal(&inner, &done_event, None, vm) { + if let Err(exc) = + ThreadHandle::join_internal(&inner, &done_event, None, true, vm) + {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/thread.rs` around lines 1211 - 1249, join_internal currently enforces "Starting" and interpreter-finalizing checks (via Self::check_started and the PythonFinalizationError branch) which causes _shutdown to abort while draining threads; modify join_internal to accept a for_shutdown: bool parameter (or remove those checks here) and move the Starting-state verification and the vm.state.finalizing check into the public join() wrapper so normal joins still validate but calls from _shutdown can call join_internal(..., for_shutdown=true) (or call a new internal no-validate helper) to skip those two checks; update all callers (public join() and _shutdown) to call the appropriate variant and ensure unique symbols referenced: join_internal, check_started, ThreadHandle_join semantic check, and _shutdown are adjusted accordingly.crates/vm/src/vm/mod.rs (2)
1932-1937: ⚠️ Potential issue | 🔴 CriticalUse
Acquirefor the finalization gate ineval_breaker_tripped().Line 1935 still does a Relaxed read even though this branch decides whether
check_signals()runs at all. A stalefalsehere skips the later Acquire load incheck_signals(), so non-main threads can miss finalization on weak memory models.Proposed fix
#[cfg(feature = "threading")] - if self.state.finalizing.load(Ordering::Relaxed) && !self.is_main_thread() { + if self.state.finalizing.load(Ordering::Acquire) && !self.is_main_thread() { return true; }#!/bin/bash set -euo pipefail printf '\n== eval_breaker_tripped ==\n' sed -n '1932,1958p' crates/vm/src/vm/mod.rs printf '\n== bytecode-loop gate ==\n' sed -n '1198,1206p' crates/vm/src/frame.rs printf '\n== finalizing loads/stores ==\n' rg -n -C2 'finalizing\.(load|store)\(' crates/vm/src🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/mod.rs` around lines 1932 - 1937, The read of the finalization gate in eval_breaker_tripped() uses Ordering::Relaxed which can let a stale false skip check_signals() on weak memory models; change the load to use Ordering::Acquire (i.e., replace state.finalizing.load(Ordering::Relaxed) with state.finalizing.load(Ordering::Acquire)) so that the subsequent check_signals()'s synchronization is not bypassed, leaving all other logic in eval_breaker_tripped() and callers like check_signals() unchanged.
332-339: ⚠️ Potential issue | 🟡 MinorReset
stats_last_wait_nson the zero-thread fast path.When Line 332 returns immediately,
stats_snapshot()keeps reporting the previous stop's wait time even though this stop waited0ns. Clearstats_last_wait_nsbefore returning.Proposed fix
if initial_countdown == 0 { + self.stats_last_wait_ns.store(0, Ordering::Relaxed); self.world_stopped.store(true, Ordering::Release); #[cfg(debug_assertions)] self.debug_assert_all_non_requester_suspended(vm); stw_trace(format_args!( "stop end requester={requester_ident} wait_ns=0 polls=0"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/mod.rs` around lines 332 - 339, The zero-thread fast path returns without resetting the last-stop duration, so stats_snapshot() reports stale wait time; in the initial_countdown == 0 branch reset self.stats_last_wait_ns to zero (use the same atomic store pattern and Ordering used elsewhere in this module) before the stw_trace/return (after debug_assert_all_non_requester_suspended(vm) if present) so the reported wait_ns is correctly 0 for this stop.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/stdlib/posix.rs`:
- Around line 994-997: The code currently drops the result of
crate::stdlib::warnings::warn (using let _ = ...) which prevents
warning-as-error filters from taking effect; change the call site to propagate
the PyResult by removing the discard and using the try operator (i.e. return the
warn() result), change warn_if_multi_threaded to return PyResult<()> (update its
signature and return points) and propagate that result at each call site (e.g.,
where warn_if_multi_threaded("fork", num_os_threads, vm) is invoked), and update
the surrounding comment to state that CPython propagates warning errors rather
than silently ignoring them.
---
Duplicate comments:
In `@crates/vm/src/stdlib/thread.rs`:
- Around line 1211-1249: join_internal currently enforces "Starting" and
interpreter-finalizing checks (via Self::check_started and the
PythonFinalizationError branch) which causes _shutdown to abort while draining
threads; modify join_internal to accept a for_shutdown: bool parameter (or
remove those checks here) and move the Starting-state verification and the
vm.state.finalizing check into the public join() wrapper so normal joins still
validate but calls from _shutdown can call join_internal(..., for_shutdown=true)
(or call a new internal no-validate helper) to skip those two checks; update all
callers (public join() and _shutdown) to call the appropriate variant and ensure
unique symbols referenced: join_internal, check_started, ThreadHandle_join
semantic check, and _shutdown are adjusted accordingly.
In `@crates/vm/src/vm/mod.rs`:
- Around line 1932-1937: The read of the finalization gate in
eval_breaker_tripped() uses Ordering::Relaxed which can let a stale false skip
check_signals() on weak memory models; change the load to use Ordering::Acquire
(i.e., replace state.finalizing.load(Ordering::Relaxed) with
state.finalizing.load(Ordering::Acquire)) so that the subsequent
check_signals()'s synchronization is not bypassed, leaving all other logic in
eval_breaker_tripped() and callers like check_signals() unchanged.
- Around line 332-339: The zero-thread fast path returns without resetting the
last-stop duration, so stats_snapshot() reports stale wait time; in the
initial_countdown == 0 branch reset self.stats_last_wait_ns to zero (use the
same atomic store pattern and Ordering used elsewhere in this module) before the
stw_trace/return (after debug_assert_all_non_requester_suspended(vm) if present)
so the reported wait_ns is correctly 0 for this stop.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: a2aa3b12-c448-410a-8e2d-4b744c66252d
📒 Files selected for processing (3)
crates/vm/src/stdlib/posix.rscrates/vm/src/stdlib/thread.rscrates/vm/src/vm/mod.rs
Sorry, something went wrong.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/vm/src/stdlib/posix.rs (1)
904-923: macOS: Callingtask_for_pidon self is unnecessary overhead.Using
task_for_pid(mach_task_self(), getpid(), &mut proc_task)is redundant sincemach_task_self()already returns the task port for the current process. Additionally,task_for_pidtypically requires special entitlements (taskgated) when used for other processes, and while it works for self, it's an extra syscall.♻️ Suggested simplification
- let self_task = unsafe { mach_task_self() }; - let mut proc_task: MachPortT = 0; - if unsafe { task_for_pid(self_task, libc::getpid(), &mut proc_task) } == KERN_SUCCESS { - let mut threads: ThreadActArrayT = core::ptr::null_mut(); - let mut n_threads: MachMsgTypeNumberT = 0; - if unsafe { task_threads(proc_task, &mut threads, &mut n_threads) } == KERN_SUCCESS + let self_task = unsafe { mach_task_self() }; + let mut threads: ThreadActArrayT = core::ptr::null_mut(); + let mut n_threads: MachMsgTypeNumberT = 0; + if unsafe { task_threads(self_task, &mut threads, &mut n_threads) } == KERN_SUCCESS + { + if !threads.is_null() { + let _ = unsafe { + vm_deallocate( + self_task, + threads as libc::uintptr_t, + (n_threads as usize * core::mem::size_of::<MachPortT>()) + as libc::uintptr_t, + ) + }; + } + return n_threads as isize; + } + 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/posix.rs` around lines 904 - 923, The code redundantly calls task_for_pid(self_task, getpid(), &mut proc_task); replace that flow by assigning proc_task = self_task (from mach_task_self()) and then call task_threads(proc_task, &mut threads, &mut n_threads) as before; keep the existing vm_deallocate call using self_task and threads, and return n_threads as isize, removing the task_for_pid branch and its associated conditional while preserving error handling around task_threads and vm_deallocate (symbols: mach_task_self, proc_task, task_for_pid, task_threads, vm_deallocate, n_threads).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/stdlib/posix.rs`:
- Around line 926-946: The current parsing of /proc/self/stat is fragile because
split_whitespace() will break when the comm field (inside parentheses) contains
spaces; instead, after reading the file and converting to string, find the index
of the last ')' in the line, take the substring after that character, then
split_whitespace() on that remainder and parse the desired field (the same field
you previously accessed with nth(19)) from the resultant iterator (preserve the
existing return behavior using parse::<isize>().unwrap_or(0)); replace the
direct line.split_whitespace().nth(19) usage with this "find last ')', take
substring after it, then split" approach to reliably locate the field even when
comm contains spaces.
---
Nitpick comments:
In `@crates/vm/src/stdlib/posix.rs`:
- Around line 904-923: The code redundantly calls task_for_pid(self_task,
getpid(), &mut proc_task); replace that flow by assigning proc_task = self_task
(from mach_task_self()) and then call task_threads(proc_task, &mut threads, &mut
n_threads) as before; keep the existing vm_deallocate call using self_task and
threads, and return n_threads as isize, removing the task_for_pid branch and its
associated conditional while preserving error handling around task_threads and
vm_deallocate (symbols: mach_task_self, proc_task, task_for_pid, task_threads,
vm_deallocate, n_threads).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: d8154526-8adb-41ea-84aa-3bb0378c4963
📒 Files selected for processing (1)
crates/vm/src/stdlib/posix.rs
Sorry, something went wrong.
- Fix park_detached_threads: successful CAS no longer sets all_suspended=false, avoiding unnecessary polling rounds - Replace park_timeout(50µs) with park() in wait_while_suspended - Remove redundant self-suspension in attach_thread and detach_thread; the STW controller handles DETACHED→SUSPENDED via park_detached_threads - Add double-check under mutex before condvar wait to prevent lost wakes - Remove dead stats_detach_wait_yields field and add_detach_wait_yields
Like CPython's ThreadHandle_start, set RUNNING state in the parent thread immediately after spawn() succeeds, rather than in the child. This eliminates a race where join() could see Starting state if called before the child thread executes. Also reverts the macOS skip for test_start_new_thread_failed since the root cause is fixed.
…ad_at_finalization
Add lock_wrapped to ThreadMutex for detaching thread state while waiting on contended locks. Use it for buffered and text IO locks. Wrap FileIO read/write in allow_threads via crt_fd to prevent STW hangs on blocking file operations.
Replace parking_lot Mutex/Condvar with std::sync (pthread-based) for started_event and handle_ready_event. This prevents hangs in forked children where parking_lot's global HASHTABLE may be corrupted.
45d8129
into
RustPython:main
Mar 8, 2026
* apply more allow_threads * Simplify STW thread state transitions - Fix park_detached_threads: successful CAS no longer sets all_suspended=false, avoiding unnecessary polling rounds - Replace park_timeout(50µs) with park() in wait_while_suspended - Remove redundant self-suspension in attach_thread and detach_thread; the STW controller handles DETACHED→SUSPENDED via park_detached_threads - Add double-check under mutex before condvar wait to prevent lost wakes - Remove dead stats_detach_wait_yields field and add_detach_wait_yields * Representable for ThreadHandle * Set ThreadHandle state to Running in parent thread after spawn Like CPython's ThreadHandle_start, set RUNNING state in the parent thread immediately after spawn() succeeds, rather than in the child. This eliminates a race where join() could see Starting state if called before the child thread executes. Also reverts the macOS skip for test_start_new_thread_failed since the root cause is fixed. * Set ThreadHandle state to Running in parent thread after spawn * Add debug_assert for thread state in start_the_world * Unskip now-passing test_get_event_loop_thread and test_start_new_thread_at_finalization * Wrap IO locks and file ops in allow_threads Add lock_wrapped to ThreadMutex for detaching thread state while waiting on contended locks. Use it for buffered and text IO locks. Wrap FileIO read/write in allow_threads via crt_fd to prevent STW hangs on blocking file operations. * Use std::sync for thread start/ready events Replace parking_lot Mutex/Condvar with std::sync (pthread-based) for started_event and handle_ready_event. This prevents hangs in forked children where parking_lot's global HASHTABLE may be corrupted.
* apply more allow_threads * Simplify STW thread state transitions - Fix park_detached_threads: successful CAS no longer sets all_suspended=false, avoiding unnecessary polling rounds - Replace park_timeout(50µs) with park() in wait_while_suspended - Remove redundant self-suspension in attach_thread and detach_thread; the STW controller handles DETACHED→SUSPENDED via park_detached_threads - Add double-check under mutex before condvar wait to prevent lost wakes - Remove dead stats_detach_wait_yields field and add_detach_wait_yields * Representable for ThreadHandle * Set ThreadHandle state to Running in parent thread after spawn Like CPython's ThreadHandle_start, set RUNNING state in the parent thread immediately after spawn() succeeds, rather than in the child. This eliminates a race where join() could see Starting state if called before the child thread executes. Also reverts the macOS skip for test_start_new_thread_failed since the root cause is fixed. * Set ThreadHandle state to Running in parent thread after spawn * Add debug_assert for thread state in start_the_world * Unskip now-passing test_get_event_loop_thread and test_start_new_thread_at_finalization * Wrap IO locks and file ops in allow_threads Add lock_wrapped to ThreadMutex for detaching thread state while waiting on contended locks. Use it for buffered and text IO locks. Wrap FileIO read/write in allow_threads via crt_fd to prevent STW hangs on blocking file operations. * Use std::sync for thread start/ready events Replace parking_lot Mutex/Condvar with std::sync (pthread-based) for started_event and handle_ready_event. This prevents hangs in forked children where parking_lot's global HASHTABLE may be corrupted.
Summary by CodeRabbit
New Features
Bug Fixes
Performance