Optimize fast_locals and atomic ordering#7289
Conversation
- inc/inc_by/get: SeqCst → Relaxed - safe_inc CAS: SeqCst → Relaxed + compare_exchange_weak - dec: SeqCst → Release + Acquire fence when count drops to 0 - leak CAS: SeqCst → AcqRel/Relaxed + compare_exchange_weak
Replace vec![self_val] + extend(args.args) with FuncArgs::prepend_arg() to avoid a second heap allocation on every method call.
Early return in PyCallable::invoke() when use_tracing is false, avoiding two downcast_ref type checks on every function call.
📝 WalkthroughWalkthroughThis PR introduces lock-free frame locals storage via a new Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Sorry, something went wrong.
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin worktree-perf-step1 |
Sorry, something went wrong.
Eliminate per-instruction mutex lock/unlock overhead for local variable access. FastLocals uses UnsafeCell with safety guaranteed by the frame's state mutex and sequential same-thread execution. Affects 14+ lock() call sites in hot instruction paths (LoadFast, StoreFast, DeleteFast, and their paired variants).
2a5b8ce to
958853f
Compare
March 1, 2026 09:26
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/common/src/refcount.rs (1)
124-137: ⚠️ Potential issue | 🟠 MajorUse
Acquireon successfulsafe_incCAS for weak→strong upgrade synchronization.The
safe_incmethod is used in weak reference upgrade paths (e.g.,PyWeak::upgradeat line 688 incrates/vm/src/object/core.rs). Currently, line 136 usesOrdering::Relaxedon successful CAS, which provides no synchronization guarantees.When upgrading from weak to strong, the CAS success must acquire visibility of memory written by previous strong reference holders. The
dec()method usesOrdering::Release(line 149), establishing a release-acquire synchronization point. The successfulsafe_incmust useOrdering::Acquireto properly pair with this Release and ensure visibility of modifications made under the previous strong reference.Proposed patch
match self.state.compare_exchange_weak( old.as_raw(), new_state.as_raw(), - Ordering::Relaxed, + Ordering::Acquire, Ordering::Relaxed, ) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/common/src/refcount.rs` around lines 124 - 137, The CAS in safe_inc (the compare_exchange_weak call that uses old.as_raw/new_state.as_raw) currently uses Ordering::Relaxed for the success case; change the success ordering to Ordering::Acquire while keeping the failure ordering relaxed, so the weak→strong upgrade acquires the release from dec() and sees prior writes. Locate safe_inc, the State::from_raw / add_strong usage and adjust the compare_exchange_weak success ordering to Acquire.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/frame.rs`:
- Around line 96-133: The FastLocals UnsafeCell can be accessed concurrently via
the Python-exposed f_locals() path (which calls locals()), violating the
assumption that access is serialized by with_exec(); to fix, either (A) remove
the unsafe impls unsafe impl Send/Sync for FastLocals and add runtime assertions
to prevent cross-thread access, or (B) gate all fastlocals borrows behind the
frame execution guard/token (have f_locals() acquire the same with_exec() guard
or a borrow token before calling locals()/FastLocals::borrow/borrow_mut), or (C)
add a runtime thread-check that panics on unsynchronized cross-thread access;
update the FastLocals docs to reflect the chosen model and adjust callers
(f_locals, locals, and any direct FastLocals usage) to obtain the guard/token or
to compile without Send/Sync accordingly.
---
Outside diff comments:
In `@crates/common/src/refcount.rs`:
- Around line 124-137: The CAS in safe_inc (the compare_exchange_weak call that
uses old.as_raw/new_state.as_raw) currently uses Ordering::Relaxed for the
success case; change the success ordering to Ordering::Acquire while keeping the
failure ordering relaxed, so the weak→strong upgrade acquires the release from
dec() and sees prior writes. Locate safe_inc, the State::from_raw / add_strong
usage and adjust the compare_exchange_weak success ordering to Acquire.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
crates/common/src/refcount.rscrates/vm/src/builtins/frame.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/super.rscrates/vm/src/frame.rscrates/vm/src/protocol/callable.rs
Sorry, something went wrong.
2b08445
into
RustPython:main
Mar 1, 2026
* Relax RefCount atomic ordering from SeqCst to Arc pattern - inc/inc_by/get: SeqCst → Relaxed - safe_inc CAS: SeqCst → Relaxed + compare_exchange_weak - dec: SeqCst → Release + Acquire fence when count drops to 0 - leak CAS: SeqCst → AcqRel/Relaxed + compare_exchange_weak * Reuse existing Vec via prepend_arg in execute_call Replace vec![self_val] + extend(args.args) with FuncArgs::prepend_arg() to avoid a second heap allocation on every method call. * Skip downcast_ref checks in invoke when tracing is disabled Early return in PyCallable::invoke() when use_tracing is false, avoiding two downcast_ref type checks on every function call. * Replace fastlocals PyMutex with UnsafeCell-based FastLocals Eliminate per-instruction mutex lock/unlock overhead for local variable access. FastLocals uses UnsafeCell with safety guaranteed by the frame's state mutex and sequential same-thread execution. Affects 14+ lock() call sites in hot instruction paths (LoadFast, StoreFast, DeleteFast, and their paired variants). * Auto-format: cargo fmt --all --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Summary by CodeRabbit
Release Notes