freelist optimization#7337
Conversation
📝 WalkthroughWalkthroughAdds per-type, thread-local freelists and integrates them into allocation/deallocation: Changes
Sequence DiagramsequenceDiagram
participant App as Application
participant Alloc as PyRef::new_ref
participant Freelist as Thread-local Freelist
participant Heap as Heap Allocation
participant Dealloc as default_dealloc
rect rgba(100,150,200,0.5)
Note over App,Alloc: Allocation flow
App->>Alloc: request new PyRef<T>
Alloc->>Freelist: call T::freelist_pop() (if exact type)
alt cached object available
Freelist-->>Alloc: NonNull<PyObject>
Alloc->>Alloc: reset fields, assign payload, return reused object
else no cached object
Alloc->>Heap: allocate new PyInner<T> (Box)
end
Alloc-->>App: initialized PyRef<T>
end
rect rgba(200,100,100,0.5)
Note over App,Dealloc: Deallocation flow
App->>Dealloc: drop PyObject
Dealloc->>Freelist: call T::freelist_push(obj) if T::HAS_FREELIST and exact type
alt freelist accepted
Freelist->>Freelist: store pointer in thread-local Vec
Freelist-->>Dealloc: return true (cached)
else rejected or full
Dealloc->>Heap: deallocate via Box::from_raw / normal drop
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Sorry, something went wrong.
b46a203 to
2b6fc93
Compare
March 4, 2026 05:03
Add freelist_push/freelist_pop hooks to PyPayload trait with default no-op implementations. Modify default_dealloc to offer dead objects to the freelist before freeing memory. Modify PyRef::new_ref to try popping from the freelist before allocating. Implement thread-local freelist for PyFloat (max 100 entries). Float objects are ideal candidates: fixed-size payload (f64), no GC tracking, no instance dict, trivial Drop. Benchmark improvement on float-heavy workloads: float_arith: ~12% faster float_list: ~17% faster
Add HAS_FREELIST const to PyPayload trait. For freelist types, GC untracking is done immediately (not deferred) during dealloc to prevent race conditions when the object is reused. Restructure new_ref so freelist-popped objects also get GC tracked, enabling freelist support for GC-tracked types like PyList. Add drop_in_place for old payload before writing new one, required for types with non-trivial Drop (e.g. PyList's RwLock<Vec>). Implement thread-local freelist for PyList (max 80 entries).
Add thread-local freelist for PyDict (max 80 entries). Add heaptype check in default_dealloc to prevent subclass instances from entering the freelist. Subclass objects have a different Python type than the base type, so reusing them would return objects with the wrong __class__. Skip PyTuple freelist: structseq types (stat_result, struct_time) are static subtypes sharing the same Rust payload, making type-safe reuse impractical with heaptype check alone.
2b6fc93 to
a7c6c7e
Compare
March 4, 2026 06:52
a7c6c7e to
e4d4d11
Compare
March 4, 2026 06:53
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/object/core.rs`:
- Around line 825-832: The freelist currently stores raw pointers in TLS
(Vec<*mut PyObject>) so thread teardown only drops pointers and leaks the
underlying PyInner<T> allocations; implement a drain that safely converts each
raw pointer back into Box<PyInner<T>> and drops them (e.g.,
drain_freelist::<T>() which calls Box::from_raw for each pointer and lets it
drop), remove or repurpose the #[allow(dead_code)] on drop_freelist_object and
use it inside the drain, then replace each *_FREELIST TLS Vec with a small
owning wrapper type (e.g., FreelistTls<T>) that holds the Vec<*mut PyObject> and
implements Drop to call drain_freelist::<T>(); finally, ensure the TLS is
initialized with this wrapper so thread teardown runs the Drop and frees all
cached objects.
In `@crates/vm/src/object/payload.rs`:
- Around line 69-76: The freelist contract must match how callers use returned
objects: update the freelist_pop contract (and implementation of freelist reuse)
so it guarantees the returned PyObject payload is valid/initialized (or at least
in a state safe to call drop_in_place), or alternatively change PyRef::new_ref
to avoid unconditionally calling drop_in_place on payloads returned from
freelist_pop; reference the symbols freelist_pop and PyRef::new_ref and ensure
either freelist_pop documents/ensures initialized payloads or PyRef::new_ref
detects freelist-origin objects and skips/drop-safely handles uninitialized
payloads to eliminate the UB.
ℹ️ Review info
Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: abf7359e-93a0-453e-94e1-67512b71c568
📒 Files selected for processing (8)
.cspell.dict/cpython.txtcrates/vm/src/builtins/dict.rscrates/vm/src/builtins/float.rscrates/vm/src/builtins/list.rscrates/vm/src/builtins/slice.rscrates/vm/src/builtins/tuple.rscrates/vm/src/object/core.rscrates/vm/src/object/payload.rs
Sorry, something went wrong.
Replace Cell<Vec<*mut PyObject>> with Cell<FreeList<T>> which implements Drop to properly deallocate PyInner<T> when threads exit. Also fix freelist_pop safety doc to match actual contract.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/vm/src/object/core.rs (1)
1784-1806: Consider clarifying the comment about preserved fields.The comment on line 1794 states "typ, vtable, dict(None), weak_list, slots are preserved" which could be misleading. The
weak_listwas cleared indrop_slow_innerbefore the object was pushed to the freelist, so it's preserved only as an empty structure. The comment could be clearer about this:- // typ, vtable, dict(None), weak_list, slots are preserved + // typ, vtable, dict(None), weak_list (empty), slots are preserved from cached object🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/object/core.rs` around lines 1784 - 1806, The comment saying "typ, vtable, dict(None), weak_list, slots are preserved" is misleading because drop_slow_inner clears weak_list before pushing to the freelist, so weak_list is preserved only as an empty structure; update the comment near the freelist reuse branch (around T::freelist_pop, PyInner, and Box::into_raw usage) to explicitly state which fields are fully preserved (typ, vtable, slots), which are set to None/cleared (dict is None), and that weak_list is preserved but intentionally emptied by drop_slow_inner.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/builtins/slice.rs`:
- Around line 28-64: PySlice currently has HAS_FREELIST = true but relies on the
auto-generated traverse() only, so tp_clear won't extract its PyObjectRef fields
before freelist caching; add an explicit unsafe impl Traverse for PySlice that
implements clear(&mut self, out: &mut Vec<PyObjectRef>) and pushes self.start,
self.stop, and self.step (if present) into out (mirroring PyTuple/PyList/PyDict
patterns) so child references are removed prior to freelist_push; ensure the
impl coexists with the existing traverse() and freelist_* methods
(freelist_push, freelist_pop) so tp_clear behavior is correct before objects are
cached.
---
Nitpick comments:
In `@crates/vm/src/object/core.rs`:
- Around line 1784-1806: The comment saying "typ, vtable, dict(None), weak_list,
slots are preserved" is misleading because drop_slow_inner clears weak_list
before pushing to the freelist, so weak_list is preserved only as an empty
structure; update the comment near the freelist reuse branch (around
T::freelist_pop, PyInner, and Box::into_raw usage) to explicitly state which
fields are fully preserved (typ, vtable, slots), which are set to None/cleared
(dict is None), and that weak_list is preserved but intentionally emptied by
drop_slow_inner.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 5c9293ca-5ac8-4725-958e-c680f6fdc997
📒 Files selected for processing (6)
crates/vm/src/builtins/dict.rscrates/vm/src/builtins/float.rscrates/vm/src/builtins/list.rscrates/vm/src/builtins/slice.rscrates/vm/src/object/core.rscrates/vm/src/object/payload.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/vm/src/object/payload.rs
Sorry, something went wrong.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/vm/src/builtins/slice.rs (1)
36-45: ⚠️ Potential issue | 🔴 Critical
PySlice::clear()leavesstopalive on freelist-hit.Line 40’s assumption is incorrect for the freelist path: if
freelist_pushsucceeds, payload drop is deferred, sostopstays as a strong reference inside cached objects. That can retain arbitrary object graphs and breaks expectedtp_clearbehavior for freelisted objects.Temporary safe mitigation until full `stop` extraction is implemented
impl PyPayload for PySlice { - const MAX_FREELIST: usize = 1; - const HAS_FREELIST: bool = true; + const MAX_FREELIST: usize = 0; + const HAS_FREELIST: bool = false;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/slice.rs` around lines 36 - 45, The clear method for PySlice leaves self.stop alive on freelist-hit; change PySlice::clear (the function handling start/stop/step) to extract stop as well (use self.stop.take()) and push it into out when Some, just like start and step, so the strong reference is removed immediately even if freelist_push defers payload drop; ensure the updated logic handles start, stop, and step symmetrically to avoid retaining object graphs on freelist reuse.
🧹 Nitpick comments (1)
crates/vm/src/object/core.rs (1)
1785-1807: Collapse duplicated allocation fallback inPyRef::new_ref.The “allocate new
PyInner” block is duplicated in both branches. Extracting cached-value selection first makes the control flow tighter and easier to maintain.Refactor sketch
- let ptr = if !has_dict && !is_heaptype { - if let Some(cached) = unsafe { T::freelist_pop() } { - let inner = cached.as_ptr() as *mut PyInner<T>; - unsafe { - core::ptr::write(&mut (*inner).ref_count, RefCount::new()); - (*inner).gc_bits.store(0, Ordering::Relaxed); - core::ptr::drop_in_place(&mut (*inner).payload); - core::ptr::write(&mut (*inner).payload, payload); - // typ, vtable, slots are preserved; dict is None, weak_list was - // cleared by drop_slow_inner before freelist push - } - // Drop the caller's typ since the cached object already holds one - drop(typ); - unsafe { NonNull::new_unchecked(inner.cast::<Py<T>>()) } - } else { - let inner = Box::into_raw(PyInner::new(payload, typ, dict)); - unsafe { NonNull::new_unchecked(inner.cast::<Py<T>>()) } - } - } else { - let inner = Box::into_raw(PyInner::new(payload, typ, dict)); - unsafe { NonNull::new_unchecked(inner.cast::<Py<T>>()) } - }; + let cached = if !has_dict && !is_heaptype { + unsafe { T::freelist_pop() } + } else { + None + }; + + let ptr = if let Some(cached) = cached { + let inner = cached.as_ptr() as *mut PyInner<T>; + unsafe { + core::ptr::write(&mut (*inner).ref_count, RefCount::new()); + (*inner).gc_bits.store(0, Ordering::Relaxed); + core::ptr::drop_in_place(&mut (*inner).payload); + core::ptr::write(&mut (*inner).payload, payload); + } + drop(typ); + unsafe { NonNull::new_unchecked(inner.cast::<Py<T>>()) } + } else { + let inner = Box::into_raw(PyInner::new(payload, typ, dict)); + unsafe { NonNull::new_unchecked(inner.cast::<Py<T>>()) } + };As per coding guidelines: "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/object/core.rs` around lines 1785 - 1807, The allocation branch is duplicated; refactor PyRef::new_ref by first selecting an inner pointer candidate (call T::freelist_pop() when !has_dict && !is_heaptype to get an Option<NonNull<PyInner<T>>>), then in a single common block handle either reinitializing the cached inner (reset ref_count, gc_bits, drop and write payload, clear dict/weak handling, and drop the incoming typ) when Some(cached) or allocating with Box::into_raw(PyInner::new(payload, typ, dict)) when None, and finally return NonNull::new_unchecked(inner.cast::<Py<T>>()). Keep the exact operations used now (core::ptr::write, drop_in_place, gc_bits.store, drop(typ)) and the same conditions (has_dict/is_heaptype) but remove the duplicated Box::into_raw code paths so the common logic is executed once.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/vm/src/builtins/slice.rs`:
- Around line 36-45: The clear method for PySlice leaves self.stop alive on
freelist-hit; change PySlice::clear (the function handling start/stop/step) to
extract stop as well (use self.stop.take()) and push it into out when Some, just
like start and step, so the strong reference is removed immediately even if
freelist_push defers payload drop; ensure the updated logic handles start, stop,
and step symmetrically to avoid retaining object graphs on freelist reuse.
---
Nitpick comments:
In `@crates/vm/src/object/core.rs`:
- Around line 1785-1807: The allocation branch is duplicated; refactor
PyRef::new_ref by first selecting an inner pointer candidate (call
T::freelist_pop() when !has_dict && !is_heaptype to get an
Option<NonNull<PyInner<T>>>), then in a single common block handle either
reinitializing the cached inner (reset ref_count, gc_bits, drop and write
payload, clear dict/weak handling, and drop the incoming typ) when Some(cached)
or allocating with Box::into_raw(PyInner::new(payload, typ, dict)) when None,
and finally return NonNull::new_unchecked(inner.cast::<Py<T>>()). Keep the exact
operations used now (core::ptr::write, drop_in_place, gc_bits.store, drop(typ))
and the same conditions (has_dict/is_heaptype) but remove the duplicated
Box::into_raw code paths so the common logic is executed once.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 580bf53d-710b-4561-a4c3-4774712af832
📒 Files selected for processing (2)
crates/vm/src/builtins/slice.rscrates/vm/src/object/core.rs
Sorry, something went wrong.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/object/core.rs`:
- Around line 1792-1804: The cached-object branch currently drops the incoming
typ and leaves the cached object's typ unchanged, causing stale class identity;
instead, in the cached branch (inside the unsafe block handling inner: *mut
PyInner<T>) replace the stored typ with the incoming typ: drop the old typ
in-place (core::ptr::drop_in_place(&mut (*inner).typ)) and core::ptr::write(&mut
(*inner).typ, typ) (or equivalently move typ into (*inner).typ) before leaving
the unsafe block, then do not call drop(typ) afterwards; this ensures
PyInner<T>::typ is updated when reusing freelist objects while preserving the
existing payload/vtable handling.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 2e7d6088-49dd-4928-9830-e5714a64b5c2
📒 Files selected for processing (1)
crates/vm/src/object/core.rs
Sorry, something went wrong.
a98c646
into
RustPython:main
Mar 4, 2026
* Add object freelist infrastructure and PyFloat freelist Add freelist_push/freelist_pop hooks to PyPayload trait with default no-op implementations. Modify default_dealloc to offer dead objects to the freelist before freeing memory. Modify PyRef::new_ref to try popping from the freelist before allocating. Implement thread-local freelist for PyFloat (max 100 entries). Float objects are ideal candidates: fixed-size payload (f64), no GC tracking, no instance dict, trivial Drop. Benchmark improvement on float-heavy workloads: float_arith: ~12% faster float_list: ~17% faster * Add PyList freelist and GC-safe freelist infrastructure Add HAS_FREELIST const to PyPayload trait. For freelist types, GC untracking is done immediately (not deferred) during dealloc to prevent race conditions when the object is reused. Restructure new_ref so freelist-popped objects also get GC tracked, enabling freelist support for GC-tracked types like PyList. Add drop_in_place for old payload before writing new one, required for types with non-trivial Drop (e.g. PyList's RwLock<Vec>). Implement thread-local freelist for PyList (max 80 entries). * Add PyDict freelist and exact-type guard for freelist push Add thread-local freelist for PyDict (max 80 entries). Add heaptype check in default_dealloc to prevent subclass instances from entering the freelist. Subclass objects have a different Python type than the base type, so reusing them would return objects with the wrong __class__. Skip PyTuple freelist: structseq types (stat_result, struct_time) are static subtypes sharing the same Rust payload, making type-safe reuse impractical with heaptype check alone. * Add PySlice freelist (max 1, matching PySlice_MAXFREELIST) * Add FreeList<T> wrapper to drain cached objects on thread teardown Replace Cell<Vec<*mut PyObject>> with Cell<FreeList<T>> which implements Drop to properly deallocate PyInner<T> when threads exit. Also fix freelist_pop safety doc to match actual contract. * Add manual Traverse impl for PySlice with clear() and fix comment * Deduplicate allocation fallback in PyRef::new_ref
Summary by CodeRabbit