Specialized ops#7322
Conversation
📝 WalkthroughWalkthroughThis PR makes bytecode slots atomic for concurrent specialization, centralizes adaptive inline-caching and deoptimization flows, moves JIT cache storage to a mutex-protected optional, strengthens several atomic ordering choices across types/dicts, and distinguishes Python-level methods from native slots in slot lookup/update logic. Changes
Sequence Diagram(s)sequenceDiagram
participant Frame as ExecutingFrame
participant CodeUnits as Code Storage
participant Adaptive as Adaptive Engine
participant Specialized as Specialized Handler
participant Base as Base Handler
Frame->>CodeUnits: read_op(index) (Acquire)
CodeUnits-->>Frame: opcode
Frame->>Adaptive: check adaptive counter & candidate (Relaxed)
Adaptive->>Adaptive: increment counter (Relaxed)
Adaptive->>Specialized: attempt specialization
alt Success
Specialized->>CodeUnits: compare_exchange_op(index, expected, new) (Release)
CodeUnits-->>Specialized: CAS success
Specialized-->>Frame: execute specialized path
else Fail
Specialized->>CodeUnits: write base opcode (Release)
CodeUnits-->>Base: base opcode written
Specialized-->>Frame: fall back to base handler
end
sequenceDiagram
participant Setter as PyType SetAttr
participant Version as Version Tag
participant Readers as Concurrent Readers
Setter->>Version: modified() (SeqCst store)
Version->>Readers: readers load version (Acquire)
Setter->>Setter: perform attribute insert/remove
Setter-->>Readers: changes ordered via version operations
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Sorry, something went wrong.
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] lib: cpython/Lib/doctest.py dependencies:
dependent tests: (33 tests)
Legend:
|
Sorry, something went wrong.
b3456e0 to
2954b60
Compare
March 3, 2026 19:46
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/builtins/type.rs (1)
2171-2196: ⚠️ Potential issue | 🔴 CriticalPost-mutation cache invalidation is missing after pre-invalidation.
At Line 2174,
zelf.modified()runs before the attribute write.find_name_in_mro()can reassigntp_version_tagand repopulate cache entries from pre-mutation state in that window. Since this path has no second invalidation after commit, stale cache entries can survive.💡 Proposed fix (keep pre-invalidate, add post-commit invalidate)
// Invalidate inline caches before modifying attributes. // This ensures other threads see the version invalidation before // any attribute changes, preventing use-after-free of cached descriptors. zelf.modified(); if let PySetterValue::Assign(value) = value { zelf.attributes.write().insert(attr_name, value); } else { let prev_value = zelf.attributes.write().shift_remove(attr_name); // TODO: swap_remove applicable? if prev_value.is_none() { return Err(vm.new_attribute_error(format!( "type object '{}' has no attribute '{}'", zelf.name(), attr_name, ))); } } if attr_name.as_wtf8().starts_with("__") && attr_name.as_wtf8().ends_with("__") { if assign { zelf.update_slot::<true>(attr_name, &vm.ctx); } else { zelf.update_slot::<false>(attr_name, &vm.ctx); } } + // Flush any entries repopulated between pre-invalidation and commit. + zelf.modified(); Ok(())#!/bin/bash set -euo pipefail # Verify invalidation placement in setattro rg -n -C8 'fn setattro|modified\(\)|attributes\.write\(\)\.(insert|shift_remove)' crates/vm/src/builtins/type.rs # Verify cache/version reassignment path that can run concurrently rg -n -C8 'fn find_name_in_mro|assign_version_tag|tp_version_tag\.load|tp_version_tag\.store' crates/vm/src/builtins/type.rsExpected result:
modified()appears before mutation insetattro, and no post-mutation invalidation currently exists.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 2171 - 2196, The pre-mutation cache invalidation (zelf.modified()) in setattro is currently invoked before the attribute write but there is no post-commit invalidation, allowing find_name_in_mro to repopulate caches from the old state; fix this by keeping the existing pre-invalidation and adding a second invalidation immediately after the mutation commit (after attributes.write().insert or shift_remove completes and before calling update_slot::<...>), i.e., invoke zelf.modified() again to ensure tp_version_tag/cache entries are refreshed and prevent stale cache reuse.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 509-510: The code casts CodeUnit pointers to *const AtomicU16 (see
usages like units.as_ptr().wrapping_add(index) as *const AtomicU16 and the
subsequent .store calls), but CodeUnit is currently #[repr(C)] with two u8
fields and only 1-byte alignment, which violates atomic alignment requirements;
fix by changing the struct declaration to #[repr(C, align(2))] for CodeUnit and
add a compile-time assertion to ensure alignment (e.g.
assert_eq!(std::mem::align_of::<CodeUnit>(), 2) or a const_assert) so the
AtomicU16 casts are safe.
- Around line 485-499: The public safe readers (read_op, read_arg,
read_adaptive_counter) currently compute raw pointers with wrapping_add(index)
and dereference without bounds checks, which can cause UB on out-of-range
indices; update each function to load units first (as in read_cache_u16) and
assert the index (and any additional byte offsets, e.g., index+1 for arg) is
within units.len() before computing pointers and dereferencing so out-of-bounds
inputs panic instead of producing UB, keeping the existing pointer/dereference
logic otherwise.
In `@crates/vm/src/frame.rs`:
- Around line 4717-4723: The specialized LoadGlobalBuiltin path incorrectly
accepts dict subclasses via downcast_ref::<PyDict>(), which can bypass subclass
mapping; change the guard to require an exact PyDict type before doing the
direct dict lookup: first check that self.builtins is exactly the dict type
(e.g. via an exact-type check against vm.ctx.types.dict_type or a
PyDict::is_exact method), only then downcast_ref::<PyDict>() and use
cached_builtins_ver/current_builtins_ver and the fast get_item_opt path;
otherwise fall back to the generic global lookup; apply the same exact-type
check change to the other occurrence mentioned (around the 7313–7319 region).
- Around line 2442-2445: The current synchronization using
self.code.quickened.swap(true, atomic::Ordering::Relaxed) plus
atomic::fence(atomic::Ordering::Release) is misleading because there is no
matching Acquire reader; update the quickening logic in frame.rs: either remove
the explicit atomics (drop the swap + fence) if the actual memory ordering is
already provided by the replace_op (Release) ↔ read_op (Acquire) protocol in
CodeUnits, or make the intent explicit by switching to an acquire-aware pattern
(e.g., use swap/store with Release paired with an explicit Acquire load where
readers check quickened) and add a short comment referencing the CodeUnits
protocol; target the quickening site (self.code.quickened,
self.code.instructions.quicken(), atomic::fence) when making the change.
---
Outside diff comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 2171-2196: The pre-mutation cache invalidation (zelf.modified())
in setattro is currently invoked before the attribute write but there is no
post-commit invalidation, allowing find_name_in_mro to repopulate caches from
the old state; fix this by keeping the existing pre-invalidation and adding a
second invalidation immediately after the mutation commit (after
attributes.write().insert or shift_remove completes and before calling
update_slot::<...>), i.e., invoke zelf.modified() again to ensure
tp_version_tag/cache entries are refreshed and prevent stale cache reuse.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
crates/compiler-core/src/bytecode.rscrates/vm/src/builtins/frame.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/type.rscrates/vm/src/dict_inner.rscrates/vm/src/frame.rscrates/vm/src/types/slot.rs
Sorry, something went wrong.
2954b60 to
0b66b19
Compare
March 4, 2026 04:56
Optimizes user-defined class instantiation MyClass(args...) when tp_new == object.__new__ and __init__ is a simple PyFunction. Allocates the object directly and calls __init__ via invoke_exact_args, bypassing the generic type.__call__ dispatch path.
Change jitted_code from OnceCell to PyMutex<Option<CompiledCode>> so it can be cleared on __code__ assignment. The setter now sets the cached JIT code to None to prevent executing stale machine code.
- range iterator: deduplicate fast_next/next_fast - Replace raw pointer reads/writes in CodeUnits with atomic operations (AtomicU8/AtomicU16) for thread safety - Add read_op (Acquire), read_arg (Relaxed), compare_exchange_op - Use Release ordering in replace_op to synchronize cache writes - Dispatch loop reads opcodes atomically via read_op/read_arg - Fix adaptive counter access: use read/write_adaptive_counter instead of read/write_cache_u16 (was reading wrong bytes) - Add pre-check guards to all specialize_* functions to prevent concurrent specialization races - Move modified() before attribute changes in type.__setattr__ to prevent use-after-free of cached descriptors - Use SeqCst ordering in modified() for version invalidation - Add Release fence after quicken() initialization
For __getattribute__: only use getattro_wrapper when the type itself defines the attribute; otherwise inherit native slot from base class via MRO. For __setattr__/__delattr__: only store setattro_wrapper when the type has its own __setattr__ or __delattr__; otherwise keep the inherited base slot.
write_cache_u32 at cache_base+3 writes 2 code units (positions 3 and 4), but STORE_ATTR only has 4 cache entries (positions 0-3). This overwrites the next instruction with the upper 16 bits of the slot offset. Changed to write_cache_u16/read_cache_u16 since member descriptor offsets fit within u16 (max 65535 bytes).
has_python_cmp incorrectly treated method_descriptor as Python-level comparison methods, causing richcompare slot to use wrapper dispatch instead of inheriting the native slot.
partial_cmp returns None for NaN comparisons. is_some_and incorrectly returned false for all NaN comparisons, but NaN != x should be true per IEEE 754 semantics.
Change lookup_slot_in_mro to return a 3-state SlotLookupResult enum (NativeSlot/PythonMethod/NotFound) instead of Option<T>. Previously, both "found a Python-level method" and "found nothing" returned None, causing incorrect slot inheritance. For example, class Test(Mixin, TestCase) would inherit object.slot_init from Mixin via inherit_from_mro instead of using init_wrapper to dispatch TestCase.__init__. Apply this fix consistently to all slot update sites: update_main_slot!, update_sub_slot!, TpGetattro, TpSetattro, TpDescrSet, TpHash, TpRichcompare, SqAssItem, MpAssSubscript.
- deoptimize() / deoptimize_at(): replace specialized op with base op - adaptive(): decrement warmup counter or call specialize function - commit_specialization(): replace op on success, backoff on failure - execute_binary_op_int() / execute_binary_op_float(): typed binary ops Removes 10 duplicate deoptimize_* functions, consolidates 13 adaptive counter blocks, 6 binary op handlers, and 7 specialize tail patterns. Also replaces inline deopt blocks in LoadAttr/StoreAttr handlers.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/vm/src/frame.rs (2)
7391-7407: ⚠️ Potential issue | 🔴 CriticalGuard
UnpackSequenceTwoTuplebehind expected arity (UNPACK_SEQUENCE 2)
Instruction::UnpackSequenceTwoTupleis selected from runtime tuple length only. If this opcode isUNPACK_SEQUENCE NwhereN != 2, a later execution can incorrectly succeed with 2 elements instead of raising. This is a correctness break and can corrupt stack expectations.💡 Proposed fix
@@ - Instruction::UnpackSequence { size } => { - self.adaptive(|s, ii, cb| s.specialize_unpack_sequence(vm, ii, cb)); - self.unpack_sequence(size.get(arg), vm) + Instruction::UnpackSequence { size } => { + let expected_size = size.get(arg); + self.adaptive(|s, ii, cb| s.specialize_unpack_sequence(vm, expected_size, ii, cb)); + self.unpack_sequence(expected_size, vm) } @@ - fn specialize_unpack_sequence( + fn specialize_unpack_sequence( &mut self, vm: &VirtualMachine, + expected_size: u32, instr_idx: usize, cache_base: usize, ) { @@ - let new_op = if let Some(tuple) = obj.downcast_ref_if_exact::<PyTuple>(vm) { - if tuple.len() == 2 { + let new_op = if let Some(tuple) = obj.downcast_ref_if_exact::<PyTuple>(vm) { + if expected_size == 2 && tuple.len() == 2 { Some(Instruction::UnpackSequenceTwoTuple) } else { Some(Instruction::UnpackSequenceTuple) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 7391 - 7407, The specialization picks Instruction::UnpackSequenceTwoTuple based solely on the runtime tuple length, which can violate the original UNPACK_SEQUENCE N expectation; update specialize_unpack_sequence to first inspect the existing instruction at instr_idx (match on Instruction::UnpackSequence { count }) and only choose UnpackSequenceTwoTuple when count == 2 in addition to the obj being an exact PyTuple of length 2; keep all other paths unchanged so UNPACK_SEQUENCE N (N != 2) still preserves the generic UnpackSequence behavior.
7475-7486: ⚠️ Potential issue | 🟠 MajorAvoid unchecked narrowing of slot offset in
StoreAttrSlotcache
offset as u16can truncate. If that happens, specialization can write to the wrong slot index at runtime. Please guard the conversion and back off specialization when it does not fit.💡 Proposed fix
if let Some(ref descr) = cls_attr && let Some(member_descr) = descr.downcast_ref::<PyMemberDescriptor>() && let MemberGetter::Offset(offset) = member_descr.member.getter { - unsafe { - self.code - .instructions - .write_cache_u32(cache_base + 1, type_version); - self.code - .instructions - .write_cache_u16(cache_base + 3, offset as u16); - self.code - .instructions - .replace_op(instr_idx, Instruction::StoreAttrSlot); - } + if let Ok(slot_offset) = u16::try_from(offset) { + unsafe { + self.code + .instructions + .write_cache_u32(cache_base + 1, type_version); + self.code + .instructions + .write_cache_u16(cache_base + 3, slot_offset); + self.code + .instructions + .replace_op(instr_idx, Instruction::StoreAttrSlot); + } + } else { + unsafe { + self.code + .instructions + .write_adaptive_counter(cache_base, ADAPTIVE_BACKOFF_VALUE); + } + } } else {As per coding guidelines: "Follow Rust best practices for error handling and memory management".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 7475 - 7486, The code currently casts offset as u16 unconditionally in the StoreAttrSlot cache path (see cls_attr, member_descr and MemberGetter::Offset(offset) and the call to write_cache_u16), which can truncate large offsets; add a guard that checks whether offset fits in u16 (e.g. offset <= u16::MAX) before taking the specialization/unsafe path and writing the cache (write_cache_u32/write_cache_u16); if the offset does not fit, back off the specialization and fall through to the generic store-attribute path instead of writing a truncated offset into the cache.
♻️ Duplicate comments (2)
crates/compiler-core/src/bytecode.rs (2)
485-499: ⚠️ Potential issue | 🔴 CriticalSafe readers dereference raw pointers without bounds validation.
Lines [485-499] and [567-574] are safe methods, but they compute/deref raw pointers without checking
index < len. Out-of-range calls can produce UB instead of a panic.Proposed fix
pub fn read_op(&self, index: usize) -> Instruction { let units = unsafe { &*self.0.get() }; + assert!(index < units.len(), "read_op: index out of bounds"); let ptr = units.as_ptr().wrapping_add(index) as *const AtomicU8; let byte = unsafe { &*ptr }.load(Ordering::Acquire); unsafe { mem::transmute::<u8, Instruction>(byte) } } pub fn read_arg(&self, index: usize) -> OpArgByte { let units = unsafe { &*self.0.get() }; + assert!(index < units.len(), "read_arg: index out of bounds"); let ptr = units.as_ptr().wrapping_add(index) as *const u8; let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8; OpArgByte::from(unsafe { &*arg_ptr }.load(Ordering::Relaxed)) } pub fn read_adaptive_counter(&self, index: usize) -> u8 { let units = unsafe { &*self.0.get() }; + assert!(index < units.len(), "read_adaptive_counter: index out of bounds"); let ptr = units.as_ptr().wrapping_add(index) as *const u8; let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8; unsafe { &*arg_ptr }.load(Ordering::Relaxed) }#!/bin/bash set -e sed -n '483,500p' crates/compiler-core/src/bytecode.rs sed -n '567,575p' crates/compiler-core/src/bytecode.rsAs per coding guidelines: "Follow Rust best practices for error handling and memory management".
Also applies to: 567-574
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode.rs` around lines 485 - 499, The safe reader methods read_op and read_arg perform raw pointer arithmetic/derefs without validating index, which can cause UB for out-of-range indices; modify both functions to first obtain units (as now), check that index < units.len() and if not panic with a clear message (e.g., index out of bounds) before doing any pointer arithmetic/dereference, then proceed with the existing AtomicU8 load and transmute/OpArgByte conversion; ensure the check uses the same units reference used for pointer math so bounds are validated against the exact slice.
342-350: ⚠️ Potential issue | 🔴 Critical
AtomicU16pointer casts require alignment thatCodeUnitdoes not guarantee.Lines [509-523] cast
CodeUnit*toAtomicU16*, butCodeUnitat Lines [342-350] is#[repr(C)]with byte-sized fields, so alignment is not guaranteed to be 2. That is UB for atomic u16 accesses.Proposed fix
-#[repr(C)] +#[repr(C, align(2))] #[derive(Copy, Clone, Debug)] pub struct CodeUnit { pub op: Instruction, pub arg: OpArgByte, } -const _: () = assert!(mem::size_of::<CodeUnit>() == 2); +const _: () = { + assert!(mem::size_of::<CodeUnit>() == 2); + assert!(mem::align_of::<CodeUnit>() >= mem::align_of::<u16>()); +};#!/bin/bash set -e sed -n '340,352p' crates/compiler-core/src/bytecode.rs rg -n 'as \*const AtomicU16' crates/compiler-core/src/bytecode.rsAs per coding guidelines: "Follow Rust best practices for error handling and memory management".
Also applies to: 509-523
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode.rs` around lines 342 - 350, The CodeUnit struct is not guaranteed 2-byte aligned but later code casts CodeUnit pointers to AtomicU16 (unsafe for misaligned data); fix by making CodeUnit 2-byte aligned—add #[repr(C, align(2))] to the CodeUnit definition (keeping the existing fields op: Instruction and arg: OpArgByte) and verify the size assertion still holds, then confirm the atomic pointer casts around the code that converts CodeUnit* to AtomicU16* (the casts at the section around lines that perform "as *const AtomicU16"/"as *mut AtomicU16") are now safe due to the enforced alignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/vm/src/frame.rs`:
- Around line 7391-7407: The specialization picks
Instruction::UnpackSequenceTwoTuple based solely on the runtime tuple length,
which can violate the original UNPACK_SEQUENCE N expectation; update
specialize_unpack_sequence to first inspect the existing instruction at
instr_idx (match on Instruction::UnpackSequence { count }) and only choose
UnpackSequenceTwoTuple when count == 2 in addition to the obj being an exact
PyTuple of length 2; keep all other paths unchanged so UNPACK_SEQUENCE N (N !=
2) still preserves the generic UnpackSequence behavior.
- Around line 7475-7486: The code currently casts offset as u16 unconditionally
in the StoreAttrSlot cache path (see cls_attr, member_descr and
MemberGetter::Offset(offset) and the call to write_cache_u16), which can
truncate large offsets; add a guard that checks whether offset fits in u16 (e.g.
offset <= u16::MAX) before taking the specialization/unsafe path and writing the
cache (write_cache_u32/write_cache_u16); if the offset does not fit, back off
the specialization and fall through to the generic store-attribute path instead
of writing a truncated offset into the cache.
---
Duplicate comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 485-499: The safe reader methods read_op and read_arg perform raw
pointer arithmetic/derefs without validating index, which can cause UB for
out-of-range indices; modify both functions to first obtain units (as now),
check that index < units.len() and if not panic with a clear message (e.g.,
index out of bounds) before doing any pointer arithmetic/dereference, then
proceed with the existing AtomicU8 load and transmute/OpArgByte conversion;
ensure the check uses the same units reference used for pointer math so bounds
are validated against the exact slice.
- Around line 342-350: The CodeUnit struct is not guaranteed 2-byte aligned but
later code casts CodeUnit pointers to AtomicU16 (unsafe for misaligned data);
fix by making CodeUnit 2-byte aligned—add #[repr(C, align(2))] to the CodeUnit
definition (keeping the existing fields op: Instruction and arg: OpArgByte) and
verify the size assertion still holds, then confirm the atomic pointer casts
around the code that converts CodeUnit* to AtomicU16* (the casts at the section
around lines that perform "as *const AtomicU16"/"as *mut AtomicU16") are now
safe due to the enforced alignment.
ℹ️ Review info
Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: cdcead23-a33a-4961-b807-327211593155
📒 Files selected for processing (7)
crates/compiler-core/src/bytecode.rscrates/vm/src/builtins/frame.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/type.rscrates/vm/src/dict_inner.rscrates/vm/src/frame.rscrates/vm/src/types/slot.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/vm/src/dict_inner.rs
- crates/vm/src/builtins/frame.rs
Sorry, something went wrong.
- CONTAINS_OP_SET: add frozenset support in handler and specialize - TO_BOOL_ALWAYS_TRUE: cache type version instead of checking slots - LOAD_GLOBAL_BUILTIN: cache builtins dict version alongside globals - mark_stacks: deoptimize specialized opcodes for correct reachability
0b66b19 to
dd89151
Compare
March 4, 2026 05:10
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin specialization2 |
Sorry, something went wrong.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
crates/compiler-core/src/bytecode.rs (2)
485-499: ⚠️ Potential issue | 🔴 CriticalAdd bounds checks to all safe atomic readers.
read_op,read_arg, andread_adaptive_counterare safe APIs but dereference pointers derived from uncheckedindex. Out-of-range input is UB instead of a panic.#!/bin/bash set -euo pipefail FILE="crates/compiler-core/src/bytecode.rs" # Verify safe readers and whether they assert bounds. rg -n "pub fn read_op|pub fn read_arg|pub fn read_adaptive_counter|assert!" "$FILE" -A8 -B2🔧 Suggested fix
pub fn read_op(&self, index: usize) -> Instruction { let units = unsafe { &*self.0.get() }; + assert!(index < units.len(), "read_op: index out of bounds"); let ptr = units.as_ptr().wrapping_add(index) as *const AtomicU8; let byte = unsafe { &*ptr }.load(Ordering::Acquire); // SAFETY: Only valid Instruction values are stored via replace_op/compare_exchange_op. unsafe { mem::transmute::<u8, Instruction>(byte) } } pub fn read_arg(&self, index: usize) -> OpArgByte { let units = unsafe { &*self.0.get() }; + assert!(index < units.len(), "read_arg: index out of bounds"); let ptr = units.as_ptr().wrapping_add(index) as *const u8; let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8; OpArgByte::from(unsafe { &*arg_ptr }.load(Ordering::Relaxed)) } pub fn read_adaptive_counter(&self, index: usize) -> u8 { let units = unsafe { &*self.0.get() }; + assert!(index < units.len(), "read_adaptive_counter: index out of bounds"); let ptr = units.as_ptr().wrapping_add(index) as *const u8; let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8; unsafe { &*arg_ptr }.load(Ordering::Relaxed) }As per coding guidelines: "Follow Rust best practices for error handling and memory management".
Also applies to: 567-574
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode.rs` around lines 485 - 499, The safe atomic readers (read_op, read_arg, read_adaptive_counter) currently compute pointers from an unchecked index causing UB on out-of-range input; fix by reading units = unsafe { &*self.0.get() } then perform explicit bounds checks (e.g., assert!(index < units.len()) for read_op and read_adaptive_counter, and assert!(index + 1 < units.len()) for read_arg because it accesses the next byte) before creating/wrapping the pointer; only after those checks compute the ptr/arg_ptr and perform the AtomicU8 load so out-of-range indices panic instead of producing UB.
509-510: ⚠️ Potential issue | 🔴 CriticalGuarantee 2-byte alignment before casting to
AtomicU16.The
CodeUnit* -> AtomicU16*cast requires 2-byte alignment. With#[repr(C)]and two 1-byte fields, this alignment is not guaranteed, so these atomic accesses are UB-prone.#!/bin/bash set -euo pipefail FILE="crates/compiler-core/src/bytecode.rs" # Show CodeUnit layout declaration + AtomicU16 pointer casts. rg -n "struct CodeUnit|repr\\(C|AtomicU16|size_of::<CodeUnit>|align_of::<CodeUnit>" "$FILE" -A3 -B2🔧 Suggested fix
-#[repr(C)] +#[repr(C, align(2))] #[derive(Copy, Clone, Debug)] pub struct CodeUnit { pub op: Instruction, pub arg: OpArgByte, } -const _: () = assert!(mem::size_of::<CodeUnit>() == 2); +const _: () = { + assert!(mem::size_of::<CodeUnit>() == 2); + assert!(mem::align_of::<CodeUnit>() >= mem::align_of::<u16>()); +};As per coding guidelines: "Follow Rust best practices for error handling and memory management".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode.rs` around lines 509 - 510, The code casts CodeUnit pointers to AtomicU16 pointers (units.as_ptr().wrapping_add(index) as *const AtomicU16) but CodeUnit (with #[repr(C)] and two 1-byte fields) may not be 2-byte aligned, causing UB; fix by guaranteeing 2-byte alignment on CodeUnit (e.g., annotate the CodeUnit struct with #[repr(C, align(2))]) so the cast is valid, then replace the cast with a safe pointer cast like units.as_ptr().wrapping_add(index).cast::<AtomicU16>() and keep the unsafe { &*ptr }.store(value, Ordering::Relaxed) usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 711-716: The update to __code__ must be atomic with JIT-cache
invalidation to avoid running stale jitted code; modify set___code__ so it
acquires the jitted_code mutex and clears the cached entry before performing
self.code.swap_to_temporary_refs, or hold the same jitted_code lock while doing
both operations so no other thread can observe cleared/uncleared state in
between. Specifically, in set___code__, lock jitted_code (the Mutex used at
*self.jitted_code.lock()), set it to None, then call
self.code.swap_to_temporary_refs(code, vm) while still holding the lock, and
finally release the lock so the swap and cache invalidation occur atomically.
In `@crates/vm/src/builtins/type.rs`:
- Around line 2171-2175: The current pre-invalidation (calling zelf.modified()
once before attribute mutation) leaves a window where a concurrent
find_name_in_mro may cache results under the new tag during the mutation; fix
this by invoking zelf.modified() both before starting the mutation and once more
immediately after finishing attribute modifications so any caches created during
the mutation are replay-invalidated; update the code path where attributes are
changed (the block containing the single zelf.modified() call) to perform a
second modified() post-mutation and ensure this behavior aligns with the
cache-checking logic in find_name_in_mro.
---
Duplicate comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 485-499: The safe atomic readers (read_op, read_arg,
read_adaptive_counter) currently compute pointers from an unchecked index
causing UB on out-of-range input; fix by reading units = unsafe { &*self.0.get()
} then perform explicit bounds checks (e.g., assert!(index < units.len()) for
read_op and read_adaptive_counter, and assert!(index + 1 < units.len()) for
read_arg because it accesses the next byte) before creating/wrapping the
pointer; only after those checks compute the ptr/arg_ptr and perform the
AtomicU8 load so out-of-range indices panic instead of producing UB.
- Around line 509-510: The code casts CodeUnit pointers to AtomicU16 pointers
(units.as_ptr().wrapping_add(index) as *const AtomicU16) but CodeUnit (with
#[repr(C)] and two 1-byte fields) may not be 2-byte aligned, causing UB; fix by
guaranteeing 2-byte alignment on CodeUnit (e.g., annotate the CodeUnit struct
with #[repr(C, align(2))]) so the cast is valid, then replace the cast with a
safe pointer cast like units.as_ptr().wrapping_add(index).cast::<AtomicU16>()
and keep the unsafe { &*ptr }.store(value, Ordering::Relaxed) usage.
ℹ️ Review info
Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 3cc6b83b-aec4-4da8-a316-976f49a64f3c
📒 Files selected for processing (7)
crates/compiler-core/src/bytecode.rscrates/vm/src/builtins/frame.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/type.rscrates/vm/src/dict_inner.rscrates/vm/src/frame.rscrates/vm/src/types/slot.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/vm/src/builtins/frame.rs
- crates/vm/src/dict_inner.rs
Sorry, something went wrong.
c058add
into
RustPython:main
Mar 4, 2026
* Add CALL_ALLOC_AND_ENTER_INIT specialization Optimizes user-defined class instantiation MyClass(args...) when tp_new == object.__new__ and __init__ is a simple PyFunction. Allocates the object directly and calls __init__ via invoke_exact_args, bypassing the generic type.__call__ dispatch path. * Invalidate JIT cache when __code__ is reassigned Change jitted_code from OnceCell to PyMutex<Option<CompiledCode>> so it can be cleared on __code__ assignment. The setter now sets the cached JIT code to None to prevent executing stale machine code. * Atomic operations for specialization cache - range iterator: deduplicate fast_next/next_fast - Replace raw pointer reads/writes in CodeUnits with atomic operations (AtomicU8/AtomicU16) for thread safety - Add read_op (Acquire), read_arg (Relaxed), compare_exchange_op - Use Release ordering in replace_op to synchronize cache writes - Dispatch loop reads opcodes atomically via read_op/read_arg - Fix adaptive counter access: use read/write_adaptive_counter instead of read/write_cache_u16 (was reading wrong bytes) - Add pre-check guards to all specialize_* functions to prevent concurrent specialization races - Move modified() before attribute changes in type.__setattr__ to prevent use-after-free of cached descriptors - Use SeqCst ordering in modified() for version invalidation - Add Release fence after quicken() initialization * Fix slot wrapper override for inherited attributes For __getattribute__: only use getattro_wrapper when the type itself defines the attribute; otherwise inherit native slot from base class via MRO. For __setattr__/__delattr__: only store setattro_wrapper when the type has its own __setattr__ or __delattr__; otherwise keep the inherited base slot. * Fix StoreAttrSlot cache overflow corrupting next instruction write_cache_u32 at cache_base+3 writes 2 code units (positions 3 and 4), but STORE_ATTR only has 4 cache entries (positions 0-3). This overwrites the next instruction with the upper 16 bits of the slot offset. Changed to write_cache_u16/read_cache_u16 since member descriptor offsets fit within u16 (max 65535 bytes). * Exclude method_descriptor from has_python_cmp check has_python_cmp incorrectly treated method_descriptor as Python-level comparison methods, causing richcompare slot to use wrapper dispatch instead of inheriting the native slot. * Fix CompareOpFloat NaN handling partial_cmp returns None for NaN comparisons. is_some_and incorrectly returned false for all NaN comparisons, but NaN != x should be true per IEEE 754 semantics. * Fix invoke_exact_args borrow in CallAllocAndEnterInit * Distinguish Python method vs not-found in slot MRO lookup Change lookup_slot_in_mro to return a 3-state SlotLookupResult enum (NativeSlot/PythonMethod/NotFound) instead of Option<T>. Previously, both "found a Python-level method" and "found nothing" returned None, causing incorrect slot inheritance. For example, class Test(Mixin, TestCase) would inherit object.slot_init from Mixin via inherit_from_mro instead of using init_wrapper to dispatch TestCase.__init__. Apply this fix consistently to all slot update sites: update_main_slot!, update_sub_slot!, TpGetattro, TpSetattro, TpDescrSet, TpHash, TpRichcompare, SqAssItem, MpAssSubscript. * Extract specialization helper functions to reduce boilerplate - deoptimize() / deoptimize_at(): replace specialized op with base op - adaptive(): decrement warmup counter or call specialize function - commit_specialization(): replace op on success, backoff on failure - execute_binary_op_int() / execute_binary_op_float(): typed binary ops Removes 10 duplicate deoptimize_* functions, consolidates 13 adaptive counter blocks, 6 binary op handlers, and 7 specialize tail patterns. Also replaces inline deopt blocks in LoadAttr/StoreAttr handlers. * Improve specialization guards and fix mark_stacks - CONTAINS_OP_SET: add frozenset support in handler and specialize - TO_BOOL_ALWAYS_TRUE: cache type version instead of checking slots - LOAD_GLOBAL_BUILTIN: cache builtins dict version alongside globals - mark_stacks: deoptimize specialized opcodes for correct reachability * Auto-format: cargo fmt --all --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Summary by CodeRabbit
Performance & Stability
Refactor
Behavior