Remove Frame mutex and use DataStack bump allocator for LocalsPlus by youknowone · Pull Request #7333 · RustPython/RustPython
changed the title
Worktree frame datastack
frame datastack
youknowone
changed the title
frame datastack
Remove Frame mutex and use DataStack bump allocator for LocalsPlus
Move stack, cells_frees, prev_line out of the mutex-protected FrameState into Frame as FrameUnsafeCell fields. This eliminates mutex lock/unlock overhead on every frame execution (with_exec). Safety relies on the same single-threaded execution guarantee that FastLocals already uses.
Introduce DataStack with linked chunks (16KB initial, doubling) and push/pop bump allocation. Add datastack field to VirtualMachine. Not yet wired to frame creation.
Replace separate FastLocals (Box<[Option<PyObjectRef>]>) and BoxVec<Option<PyStackRef>> with a single LocalsPlus struct that stores both in a contiguous Box<[usize]> array. The first nlocalsplus slots are fastlocals and the rest is the evaluation stack. Typed access is provided through transmute-based methods. Remove BoxVec import from frame.rs.
Normal function calls now bump-allocate LocalsPlus data from the per-thread DataStack instead of a separate heap allocation. Generator/coroutine frames continue using heap allocation since they outlive the call. On frame exit, data is copied to the heap (materialize_to_heap) to preserve locals for tracebacks, then the DataStack is popped. VirtualMachine.datastack is wrapped in UnsafeCell for interior mutability (safe because frame allocation is single-threaded LIFO).
Update vectorcall dispatch functions to use localsplus stack accessors instead of direct stack field access. Add stack_truncate method to LocalsPlus. Update vectorcall fast path in function.rs to use datastack and fastlocals_mut().
Check both bounds of the current chunk when determining if a pop base is in the current chunk. The previous check (base >= chunk_start) fails on Windows where newer chunks may be allocated at lower addresses than older ones.
Two fixes for Cell-based types used in static items under non-threading mode, which cause data races when Rust test runner uses parallel threads: 1. LazyLock: use std::sync::LazyLock when std is available instead of wrapping core::cell::LazyCell with a false `unsafe impl Sync`. The LazyCell wrapper is kept only for no-std (truly single-threaded). 2. gc_state: use static_cell! (thread-local in non-threading mode) instead of OnceLock, so each thread gets its own GcState with Cell-based PyRwLock/PyMutex that are not accessed concurrently.
youknowone added a commit to youknowone/RustPython that referenced this pull request
…ustPython#7333) * Remove PyMutex<FrameState> from Frame, use UnsafeCell fields directly Move stack, cells_frees, prev_line out of the mutex-protected FrameState into Frame as FrameUnsafeCell fields. This eliminates mutex lock/unlock overhead on every frame execution (with_exec). Safety relies on the same single-threaded execution guarantee that FastLocals already uses. * Add thread-local DataStack for bump-allocating frame data Introduce DataStack with linked chunks (16KB initial, doubling) and push/pop bump allocation. Add datastack field to VirtualMachine. Not yet wired to frame creation. * Unify FastLocals and BoxVec stack into LocalsPlus Replace separate FastLocals (Box<[Option<PyObjectRef>]>) and BoxVec<Option<PyStackRef>> with a single LocalsPlus struct that stores both in a contiguous Box<[usize]> array. The first nlocalsplus slots are fastlocals and the rest is the evaluation stack. Typed access is provided through transmute-based methods. Remove BoxVec import from frame.rs. * Use DataStack for LocalsPlus in non-generator function calls Normal function calls now bump-allocate LocalsPlus data from the per-thread DataStack instead of a separate heap allocation. Generator/coroutine frames continue using heap allocation since they outlive the call. On frame exit, data is copied to the heap (materialize_to_heap) to preserve locals for tracebacks, then the DataStack is popped. VirtualMachine.datastack is wrapped in UnsafeCell for interior mutability (safe because frame allocation is single-threaded LIFO). * Fix clippy: import Layout from core::alloc instead of alloc::alloc * Fix vectorcall compatibility with LocalsPlus API Update vectorcall dispatch functions to use localsplus stack accessors instead of direct stack field access. Add stack_truncate method to LocalsPlus. Update vectorcall fast path in function.rs to use datastack and fastlocals_mut(). * Add datastack, nlocalsplus, ncells, tstate to cspell dictionary * Fix DataStack pop() for non-monotonic allocation addresses Check both bounds of the current chunk when determining if a pop base is in the current chunk. The previous check (base >= chunk_start) fails on Windows where newer chunks may be allocated at lower addresses than older ones. * Fix stale comments: release_datastack -> materialize_localsplus * Fix non-threading mode for parallel test execution Two fixes for Cell-based types used in static items under non-threading mode, which cause data races when Rust test runner uses parallel threads: 1. LazyLock: use std::sync::LazyLock when std is available instead of wrapping core::cell::LazyCell with a false `unsafe impl Sync`. The LazyCell wrapper is kept only for no-std (truly single-threaded). 2. gc_state: use static_cell! (thread-local in non-threading mode) instead of OnceLock, so each thread gets its own GcState with Cell-based PyRwLock/PyMutex that are not accessed concurrently. * Fix CallAllocAndEnterInit to use LocalsPlus stack API * Use checked arithmetic in LocalsPlus and DataStack allocators * Address code review: checked arithmetic, threading feature deps, Send gate - Use checked arithmetic for nlocalsplus in Frame::new - Add "std" to threading feature dependencies in rustpython-common - Gate GcState Send impl with #[cfg(feature = "threading")] * Clean up comments: remove redundant/stale remarks, fix CPython references
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters