◐ Shell
reader mode source ↗
Skip to content

freelist optimization#7337

Merged
youknowone merged 7 commits into
RustPython:mainfrom
youknowone:worktree-freelist
Mar 4, 2026
Merged

freelist optimization#7337
youknowone merged 7 commits into
RustPython:mainfrom
youknowone:worktree-freelist

Conversation

@youknowone

@youknowone youknowone commented Mar 3, 2026

Copy link
Copy Markdown
Member

Summary by CodeRabbit

  • Chores
    • Added per-type freelists for dicts, floats, lists, and slices to enable reuse of freed objects and reduce allocation overhead.
    • Allocation/deallocation now reuses cached instances when safe, improving performance and latency.
    • Updated spell-check dictionary (added "freelist").

@coderabbitai

coderabbitai Bot commented Mar 3, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Adds per-type, thread-local freelists and integrates them into allocation/deallocation: PyPayload gains freelist APIs and constants; PyDict, PyFloat, PyList, PySlice implement thread-local pools and limits; PyRef::new_ref and default_dealloc reuse or cache objects when appropriate. freelist token added to spellcheck.

Changes

Cohort / File(s) Summary
Trait & Payload core
crates/vm/src/object/payload.rs
Extend PyPayload with const HAS_FREELIST, const MAX_FREELIST, and unsafe methods freelist_push() / freelist_pop() (defaults no-op). Added NonNull import and safety docs.
Object lifecycle & freelist store
crates/vm/src/object/core.rs
Add FreeList<T> (pub(crate)) with Drop/Deref impls; default_dealloc now tries T::freelist_push for exact types and untracks immediately; PyRef::new_ref attempts T::freelist_pop to reuse cached objects.
Built-ins: freelist implementations
crates/vm/src/builtins/dict.rs, crates/vm/src/builtins/float.rs, crates/vm/src/builtins/list.rs, crates/vm/src/builtins/slice.rs
Each type adds a thread-local *_FREELIST, sets HAS_FREELIST = true, provides MAX_FREELIST, and implements unsafe freelist_push/freelist_pop. slice.rs adds Traverse handling; imports Cell and NonNull.
Tuple note
crates/vm/src/builtins/tuple.rs
Added comment explaining why PyTuple does not use a freelist (shared payload among structseq subtypes).
Spellcheck dictionary
.cspell.dict/cpython.txt
Added token freelist to repository dictionary.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Alloc as PyRef::new_ref
    participant Freelist as Thread-local Freelist
    participant Heap as Heap Allocation
    participant Dealloc as default_dealloc

    rect rgba(100,150,200,0.5)
    Note over App,Alloc: Allocation flow
    App->>Alloc: request new PyRef<T>
    Alloc->>Freelist: call T::freelist_pop() (if exact type)
    alt cached object available
        Freelist-->>Alloc: NonNull<PyObject>
        Alloc->>Alloc: reset fields, assign payload, return reused object
    else no cached object
        Alloc->>Heap: allocate new PyInner<T> (Box)
    end
    Alloc-->>App: initialized PyRef<T>
    end

    rect rgba(200,100,100,0.5)
    Note over App,Dealloc: Deallocation flow
    App->>Dealloc: drop PyObject
    Dealloc->>Freelist: call T::freelist_push(obj) if T::HAS_FREELIST and exact type
    alt freelist accepted
        Freelist->>Freelist: store pointer in thread-local Vec
        Freelist-->>Dealloc: return true (cached)
    else rejected or full
        Dealloc->>Heap: deallocate via Box::from_raw / normal drop
    end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh

Poem

🐇 Small paws tuck pointers tight,
A cache of carrots sleeps at night,
Freed and stored in thread-local tune,
They hop back fast beneath the moon,
A rabbit cheers — reuse takes flight.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'freelist optimization' directly corresponds to the main purpose of the changeset, which implements freelist optimization across multiple builtin types (dict, float, list, slice) to improve memory reuse and allocation performance.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone force-pushed the worktree-freelist branch 2 times, most recently from b46a203 to 2b6fc93 Compare March 4, 2026 05:03
Add freelist_push/freelist_pop hooks to PyPayload trait with default
no-op implementations. Modify default_dealloc to offer dead objects
to the freelist before freeing memory. Modify PyRef::new_ref to try
popping from the freelist before allocating.

Implement thread-local freelist for PyFloat (max 100 entries).
Float objects are ideal candidates: fixed-size payload (f64),
no GC tracking, no instance dict, trivial Drop.

Benchmark improvement on float-heavy workloads:
  float_arith: ~12% faster
  float_list:  ~17% faster
Add HAS_FREELIST const to PyPayload trait. For freelist types, GC
untracking is done immediately (not deferred) during dealloc to
prevent race conditions when the object is reused.

Restructure new_ref so freelist-popped objects also get GC tracked,
enabling freelist support for GC-tracked types like PyList.

Add drop_in_place for old payload before writing new one, required
for types with non-trivial Drop (e.g. PyList's RwLock<Vec>).

Implement thread-local freelist for PyList (max 80 entries).
Add thread-local freelist for PyDict (max 80 entries).

Add heaptype check in default_dealloc to prevent subclass instances
from entering the freelist. Subclass objects have a different Python
type than the base type, so reusing them would return objects with
the wrong __class__.

Skip PyTuple freelist: structseq types (stat_result, struct_time)
are static subtypes sharing the same Rust payload, making type-safe
reuse impractical with heaptype check alone.
@youknowone youknowone marked this pull request as ready for review March 4, 2026 06:53

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/object/core.rs`:
- Around line 825-832: The freelist currently stores raw pointers in TLS
(Vec<*mut PyObject>) so thread teardown only drops pointers and leaks the
underlying PyInner<T> allocations; implement a drain that safely converts each
raw pointer back into Box<PyInner<T>> and drops them (e.g.,
drain_freelist::<T>() which calls Box::from_raw for each pointer and lets it
drop), remove or repurpose the #[allow(dead_code)] on drop_freelist_object and
use it inside the drain, then replace each *_FREELIST TLS Vec with a small
owning wrapper type (e.g., FreelistTls<T>) that holds the Vec<*mut PyObject> and
implements Drop to call drain_freelist::<T>(); finally, ensure the TLS is
initialized with this wrapper so thread teardown runs the Drop and frees all
cached objects.

In `@crates/vm/src/object/payload.rs`:
- Around line 69-76: The freelist contract must match how callers use returned
objects: update the freelist_pop contract (and implementation of freelist reuse)
so it guarantees the returned PyObject payload is valid/initialized (or at least
in a state safe to call drop_in_place), or alternatively change PyRef::new_ref
to avoid unconditionally calling drop_in_place on payloads returned from
freelist_pop; reference the symbols freelist_pop and PyRef::new_ref and ensure
either freelist_pop documents/ensures initialized payloads or PyRef::new_ref
detects freelist-origin objects and skips/drop-safely handles uninitialized
payloads to eliminate the UB.

ℹ️ Review info
Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: abf7359e-93a0-453e-94e1-67512b71c568

📥 Commits

Reviewing files that changed from the base of the PR and between c058add and e4d4d11.

📒 Files selected for processing (8)
  • .cspell.dict/cpython.txt
  • crates/vm/src/builtins/dict.rs
  • crates/vm/src/builtins/float.rs
  • crates/vm/src/builtins/list.rs
  • crates/vm/src/builtins/slice.rs
  • crates/vm/src/builtins/tuple.rs
  • crates/vm/src/object/core.rs
  • crates/vm/src/object/payload.rs

Replace Cell<Vec<*mut PyObject>> with Cell<FreeList<T>> which implements
Drop to properly deallocate PyInner<T> when threads exit.
Also fix freelist_pop safety doc to match actual contract.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/vm/src/object/core.rs (1)

1784-1806: Consider clarifying the comment about preserved fields.

The comment on line 1794 states "typ, vtable, dict(None), weak_list, slots are preserved" which could be misleading. The weak_list was cleared in drop_slow_inner before the object was pushed to the freelist, so it's preserved only as an empty structure. The comment could be clearer about this:

-                    // typ, vtable, dict(None), weak_list, slots are preserved
+                    // typ, vtable, dict(None), weak_list (empty), slots are preserved from cached object
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/object/core.rs` around lines 1784 - 1806, The comment saying
"typ, vtable, dict(None), weak_list, slots are preserved" is misleading because
drop_slow_inner clears weak_list before pushing to the freelist, so weak_list is
preserved only as an empty structure; update the comment near the freelist reuse
branch (around T::freelist_pop, PyInner, and Box::into_raw usage) to explicitly
state which fields are fully preserved (typ, vtable, slots), which are set to
None/cleared (dict is None), and that weak_list is preserved but intentionally
emptied by drop_slow_inner.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/slice.rs`:
- Around line 28-64: PySlice currently has HAS_FREELIST = true but relies on the
auto-generated traverse() only, so tp_clear won't extract its PyObjectRef fields
before freelist caching; add an explicit unsafe impl Traverse for PySlice that
implements clear(&mut self, out: &mut Vec<PyObjectRef>) and pushes self.start,
self.stop, and self.step (if present) into out (mirroring PyTuple/PyList/PyDict
patterns) so child references are removed prior to freelist_push; ensure the
impl coexists with the existing traverse() and freelist_* methods
(freelist_push, freelist_pop) so tp_clear behavior is correct before objects are
cached.

---

Nitpick comments:
In `@crates/vm/src/object/core.rs`:
- Around line 1784-1806: The comment saying "typ, vtable, dict(None), weak_list,
slots are preserved" is misleading because drop_slow_inner clears weak_list
before pushing to the freelist, so weak_list is preserved only as an empty
structure; update the comment near the freelist reuse branch (around
T::freelist_pop, PyInner, and Box::into_raw usage) to explicitly state which
fields are fully preserved (typ, vtable, slots), which are set to None/cleared
(dict is None), and that weak_list is preserved but intentionally emptied by
drop_slow_inner.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 5c9293ca-5ac8-4725-958e-c680f6fdc997

📥 Commits

Reviewing files that changed from the base of the PR and between e4d4d11 and c0b33a6.

📒 Files selected for processing (6)
  • crates/vm/src/builtins/dict.rs
  • crates/vm/src/builtins/float.rs
  • crates/vm/src/builtins/list.rs
  • crates/vm/src/builtins/slice.rs
  • crates/vm/src/object/core.rs
  • crates/vm/src/object/payload.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/vm/src/object/payload.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment
♻️ Duplicate comments (1)
crates/vm/src/builtins/slice.rs (1)

36-45: ⚠️ Potential issue | 🔴 Critical

PySlice::clear() leaves stop alive on freelist-hit.

Line 40’s assumption is incorrect for the freelist path: if freelist_push succeeds, payload drop is deferred, so stop stays as a strong reference inside cached objects. That can retain arbitrary object graphs and breaks expected tp_clear behavior for freelisted objects.

Temporary safe mitigation until full `stop` extraction is implemented
 impl PyPayload for PySlice {
-    const MAX_FREELIST: usize = 1;
-    const HAS_FREELIST: bool = true;
+    const MAX_FREELIST: usize = 0;
+    const HAS_FREELIST: bool = false;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/slice.rs` around lines 36 - 45, The clear method for
PySlice leaves self.stop alive on freelist-hit; change PySlice::clear (the
function handling start/stop/step) to extract stop as well (use
self.stop.take()) and push it into out when Some, just like start and step, so
the strong reference is removed immediately even if freelist_push defers payload
drop; ensure the updated logic handles start, stop, and step symmetrically to
avoid retaining object graphs on freelist reuse.
🧹 Nitpick comments (1)
crates/vm/src/object/core.rs (1)

1785-1807: Collapse duplicated allocation fallback in PyRef::new_ref.

The “allocate new PyInner” block is duplicated in both branches. Extracting cached-value selection first makes the control flow tighter and easier to maintain.

Refactor sketch
-        let ptr = if !has_dict && !is_heaptype {
-            if let Some(cached) = unsafe { T::freelist_pop() } {
-                let inner = cached.as_ptr() as *mut PyInner<T>;
-                unsafe {
-                    core::ptr::write(&mut (*inner).ref_count, RefCount::new());
-                    (*inner).gc_bits.store(0, Ordering::Relaxed);
-                    core::ptr::drop_in_place(&mut (*inner).payload);
-                    core::ptr::write(&mut (*inner).payload, payload);
-                    // typ, vtable, slots are preserved; dict is None, weak_list was
-                    // cleared by drop_slow_inner before freelist push
-                }
-                // Drop the caller's typ since the cached object already holds one
-                drop(typ);
-                unsafe { NonNull::new_unchecked(inner.cast::<Py<T>>()) }
-            } else {
-                let inner = Box::into_raw(PyInner::new(payload, typ, dict));
-                unsafe { NonNull::new_unchecked(inner.cast::<Py<T>>()) }
-            }
-        } else {
-            let inner = Box::into_raw(PyInner::new(payload, typ, dict));
-            unsafe { NonNull::new_unchecked(inner.cast::<Py<T>>()) }
-        };
+        let cached = if !has_dict && !is_heaptype {
+            unsafe { T::freelist_pop() }
+        } else {
+            None
+        };
+
+        let ptr = if let Some(cached) = cached {
+            let inner = cached.as_ptr() as *mut PyInner<T>;
+            unsafe {
+                core::ptr::write(&mut (*inner).ref_count, RefCount::new());
+                (*inner).gc_bits.store(0, Ordering::Relaxed);
+                core::ptr::drop_in_place(&mut (*inner).payload);
+                core::ptr::write(&mut (*inner).payload, payload);
+            }
+            drop(typ);
+            unsafe { NonNull::new_unchecked(inner.cast::<Py<T>>()) }
+        } else {
+            let inner = Box::into_raw(PyInner::new(payload, typ, dict));
+            unsafe { NonNull::new_unchecked(inner.cast::<Py<T>>()) }
+        };

As per coding guidelines: "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/object/core.rs` around lines 1785 - 1807, The allocation branch
is duplicated; refactor PyRef::new_ref by first selecting an inner pointer
candidate (call T::freelist_pop() when !has_dict && !is_heaptype to get an
Option<NonNull<PyInner<T>>>), then in a single common block handle either
reinitializing the cached inner (reset ref_count, gc_bits, drop and write
payload, clear dict/weak handling, and drop the incoming typ) when Some(cached)
or allocating with Box::into_raw(PyInner::new(payload, typ, dict)) when None,
and finally return NonNull::new_unchecked(inner.cast::<Py<T>>()). Keep the exact
operations used now (core::ptr::write, drop_in_place, gc_bits.store, drop(typ))
and the same conditions (has_dict/is_heaptype) but remove the duplicated
Box::into_raw code paths so the common logic is executed once.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/vm/src/builtins/slice.rs`:
- Around line 36-45: The clear method for PySlice leaves self.stop alive on
freelist-hit; change PySlice::clear (the function handling start/stop/step) to
extract stop as well (use self.stop.take()) and push it into out when Some, just
like start and step, so the strong reference is removed immediately even if
freelist_push defers payload drop; ensure the updated logic handles start, stop,
and step symmetrically to avoid retaining object graphs on freelist reuse.

---

Nitpick comments:
In `@crates/vm/src/object/core.rs`:
- Around line 1785-1807: The allocation branch is duplicated; refactor
PyRef::new_ref by first selecting an inner pointer candidate (call
T::freelist_pop() when !has_dict && !is_heaptype to get an
Option<NonNull<PyInner<T>>>), then in a single common block handle either
reinitializing the cached inner (reset ref_count, gc_bits, drop and write
payload, clear dict/weak handling, and drop the incoming typ) when Some(cached)
or allocating with Box::into_raw(PyInner::new(payload, typ, dict)) when None,
and finally return NonNull::new_unchecked(inner.cast::<Py<T>>()). Keep the exact
operations used now (core::ptr::write, drop_in_place, gc_bits.store, drop(typ))
and the same conditions (has_dict/is_heaptype) but remove the duplicated
Box::into_raw code paths so the common logic is executed once.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 580bf53d-710b-4561-a4c3-4774712af832

📥 Commits

Reviewing files that changed from the base of the PR and between c0b33a6 and 5b78ad5.

📒 Files selected for processing (2)
  • crates/vm/src/builtins/slice.rs
  • crates/vm/src/object/core.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/object/core.rs`:
- Around line 1792-1804: The cached-object branch currently drops the incoming
typ and leaves the cached object's typ unchanged, causing stale class identity;
instead, in the cached branch (inside the unsafe block handling inner: *mut
PyInner<T>) replace the stored typ with the incoming typ: drop the old typ
in-place (core::ptr::drop_in_place(&mut (*inner).typ)) and core::ptr::write(&mut
(*inner).typ, typ) (or equivalently move typ into (*inner).typ) before leaving
the unsafe block, then do not call drop(typ) afterwards; this ensures
PyInner<T>::typ is updated when reusing freelist objects while preserving the
existing payload/vtable handling.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 2e7d6088-49dd-4928-9830-e5714a64b5c2

📥 Commits

Reviewing files that changed from the base of the PR and between 5b78ad5 and 2266d94.

📒 Files selected for processing (1)
  • crates/vm/src/object/core.rs

Hide details View details @youknowone youknowone merged commit a98c646 into RustPython:main Mar 4, 2026
13 checks passed
@youknowone youknowone deleted the worktree-freelist branch March 4, 2026 11:00
youknowone added a commit to youknowone/RustPython that referenced this pull request Mar 22, 2026
* Add object freelist infrastructure and PyFloat freelist

Add freelist_push/freelist_pop hooks to PyPayload trait with default
no-op implementations. Modify default_dealloc to offer dead objects
to the freelist before freeing memory. Modify PyRef::new_ref to try
popping from the freelist before allocating.

Implement thread-local freelist for PyFloat (max 100 entries).
Float objects are ideal candidates: fixed-size payload (f64),
no GC tracking, no instance dict, trivial Drop.

Benchmark improvement on float-heavy workloads:
  float_arith: ~12% faster
  float_list:  ~17% faster

* Add PyList freelist and GC-safe freelist infrastructure

Add HAS_FREELIST const to PyPayload trait. For freelist types, GC
untracking is done immediately (not deferred) during dealloc to
prevent race conditions when the object is reused.

Restructure new_ref so freelist-popped objects also get GC tracked,
enabling freelist support for GC-tracked types like PyList.

Add drop_in_place for old payload before writing new one, required
for types with non-trivial Drop (e.g. PyList's RwLock<Vec>).

Implement thread-local freelist for PyList (max 80 entries).

* Add PyDict freelist and exact-type guard for freelist push

Add thread-local freelist for PyDict (max 80 entries).

Add heaptype check in default_dealloc to prevent subclass instances
from entering the freelist. Subclass objects have a different Python
type than the base type, so reusing them would return objects with
the wrong __class__.

Skip PyTuple freelist: structseq types (stat_result, struct_time)
are static subtypes sharing the same Rust payload, making type-safe
reuse impractical with heaptype check alone.

* Add PySlice freelist (max 1, matching PySlice_MAXFREELIST)

* Add FreeList<T> wrapper to drain cached objects on thread teardown

Replace Cell<Vec<*mut PyObject>> with Cell<FreeList<T>> which implements
Drop to properly deallocate PyInner<T> when threads exit.
Also fix freelist_pop safety doc to match actual contract.

* Add manual Traverse impl for PySlice with clear() and fix comment

* Deduplicate allocation fallback in PyRef::new_ref
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant