gc module internal structure and API#6910
Conversation
📝 WalkthroughWalkthroughImplements a generational GC subsystem and exposes a Python-compatible gc module API, adds VM context fields for gc state, extends PyObject inspection for GC, and updates Cargo features and spellcheck tokens. Changes
Sequence DiagramsequenceDiagram
actor Python
participant GC as "gc module"
participant GCState as "GcState (global)"
participant Callbacks as "callbacks list"
participant VM as "VM Context"
Python->>GC: collect(generation=None)
GC->>GCState: collect(generation)
activate GCState
GCState->>Callbacks: invoke_callbacks("start", generation, 0, 0)
Callbacks->>VM: execute registered callbacks
VM-->>Callbacks: callback results
GCState->>GCState: perform collection (counts)
GCState->>Callbacks: invoke_callbacks("stop", generation, collected, uncollectable)
Callbacks->>VM: execute registered callbacks
VM-->>Callbacks: callback results
deactivate GCState
GCState-->>GC: return (collected, uncollectable)
GC-->>Python: return counts
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Sorry, something went wrong.
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin gc-internal |
Sorry, something went wrong.
8052f5f to
5a8b225
Compare
January 31, 2026 08:58
Add gc_state module with GcState, GcGeneration, GcDebugFlags, GcStats. Replace gc module stubs with working API backed by gc_state. Add gc_callbacks and gc_garbage to Context. Add is_gc_tracked, gc_finalized, gc_get_referents to PyObject. Collection is stubbed (returns 0) — actual algorithm to follow.
d63fba5 to
4ff9fb8
Compare
January 31, 2026 09:55
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@crates/stdlib/src/gc.rs`:
- Around line 189-196: get_referrers currently unconditionally returns an empty
list but the review warns against raising NotImplementedError; to match CPython
behavior, either implement basic GC-tracking or explicitly preserve the
empty-list semantics and document the limitation. Update the
get_referrers(FuncArgs, VirtualMachine) implementation to return
vm.ctx.new_list(vec![]) for non-discoverable referrers, add/adjust the doc
comment inside the function to state that only GC-tracked references are
discoverable and stack/C-extension references are not, and do not raise
NotImplementedError; alternatively, if you choose to implement minimal tracking,
modify the scanning logic in get_referrers to consult your GC-tracking
structures and collect referrers before falling back to vm.ctx.new_list(vec![]).
In `@crates/vm/src/gc_state.rs`:
- Around line 231-287: untrack_object currently never removes a deallocated
frozen object from permanent_objects, leaving stale pointers that break unfreeze
and get_freeze_count; update unsafe fn untrack_object to also acquire a write
lock on self.permanent_objects and remove the GcObjectPtr (gc_ptr) just like
tracked_objects and finalized_objects are handled, ensuring the permanent set
does not retain freed pointers (refer to GcObjectPtr, untrack_object,
permanent_objects, unfreeze, and get_freeze_count).
In `@crates/vm/src/object/core.rs`:
- Around line 902-919: is_gc_tracked currently returns capability
(vtable.trace.is_some() || dict.is_some()) instead of actual membership; add a
public query on GcState (e.g., GcState::is_tracked(&self, obj_id: ObjectId) or
similar) that checks the tracked_objects set used by
track_object/untrack_object, then change PyObject::is_gc_tracked to call that
GcState query rather than inspecting vtable.trace or dict; ensure you use the
object's unique identifier or reference the same key used in tracked_objects so
is_gc_tracked reflects real GC membership while gc_get_referents can remain
unchanged.
🧹 Nitpick comments (1)
crates/vm/src/gc_state.rs (1)
289-305: Finalize tracking has two sources of truth.
GcState::{is_finalized, mark_finalized}uses a HashSet, butPyObject::gc_finalizedrelies on per-object bits and stdlibgc.is_finalizeduses that path. Consider consolidating on one mechanism to avoid divergence (e.g., exposeset_gc_finalizedaspub(crate)and set bits here, or drop the set).
Sorry, something went wrong.
ShaharNaveh
left a comment
There was a problem hiding this comment.
lgtm!
Sorry, something went wrong.
cec949d to
06c8270
Compare
January 31, 2026 14:03
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [ ] lib: cpython/Lib/asyncio dependencies:
dependent tests: (8 tests)
[ ] test: cpython/Lib/test/test_dict.py (TODO: 3) dependencies: dependent tests: (no tests depend on dict) [ ] test: cpython/Lib/test/test_generators.py (TODO: 5) dependencies: dependent tests: (no tests depend on generator) [ ] lib: cpython/Lib/subprocess.py dependencies:
dependent tests: (51 tests)
[x] lib: cpython/Lib/weakref.py dependencies:
dependent tests: (146 tests)
[ ] test: cpython/Lib/test/test_weakset.py (TODO: 1) dependencies: dependent tests: (no tests depend on weakset) [ ] lib: cpython/Lib/zoneinfo dependencies:
dependent tests: (3 tests)
Legend:
|
Sorry, something went wrong.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@crates/vm/src/gc_state.rs`:
- Around line 236-252: In track_object, a poisoned write lock on
generation_objects[0] can cause the object to be inserted into tracked_objects
but not generation_objects (and counts not incremented), creating inconsistency
with get_objects or generation counts; fix by making the operation atomic:
either return a Result from track_object (propagate the lock error from
generation_objects[0].write()) or acquire/check both locks first and only insert
into tracked_objects if generation_objects[0].write() succeeded (i.e., perform
the generation_objects[0].write() and insert into gen0 and update
generations[0].count/alloc_count before touching tracked_objects), referencing
track_object, generation_objects, tracked_objects, generations, alloc_count and
get_objects so the change is applied where those symbols are used.
- Around line 391-429: The freeze() and unfreeze() functions drain sets into
local vectors then attempt to write the destination lock, which can fail and
silently drop objects; change them to handle lock errors defensively: after
draining a generation (generation_objects.write() in freeze) or draining
permanent (permanent_objects.write() in unfreeze), check the result of acquiring
the destination lock (permanent_objects.write() in freeze,
generation_objects[2].write() in unfreeze) and if it Err, re-acquire the
original source lock (or re-insert the drained pointers back into the original
generation/permanent) or else log a fatal error/panic so objects are not lost;
reference freeze, unfreeze, generation_objects, permanent_objects,
permanent.count, generations[*].count and ensure counts are only updated after
successful insertion into the destination set.
Sorry, something went wrong.
714d1ce
into
RustPython:main
Jan 31, 2026
Only interface, without actual GC
split from #6849
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.