◐ Shell
reader mode source ↗
Skip to content

gc module internal structure and API#6910

Merged
youknowone merged 4 commits into
RustPython:mainfrom
youknowone:gc-internal
Jan 31, 2026
Merged

gc module internal structure and API#6910
youknowone merged 4 commits into
RustPython:mainfrom
youknowone:gc-internal

Conversation

@youknowone

@youknowone youknowone commented Jan 30, 2026

Copy link
Copy Markdown
Member

Only interface, without actual GC

split from #6849

Summary by CodeRabbit

  • New Features
    • Full garbage collection API: enable/disable, manual collection, and generational behavior
    • GC monitoring: per-generation stats, collection counts, debug flags, and inspectable tracked objects
    • GC controls: configurable thresholds, freeze/unfreeze permanent generation, and GC callbacks/garbage lists
  • Behavioral
    • Objects can report GC tracking and expose their referents for inspection

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Jan 30, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Implements a generational GC subsystem and exposes a Python-compatible gc module API, adds VM context fields for gc state, extends PyObject inspection for GC, and updates Cargo features and spellcheck tokens.

Changes

Cohort / File(s) Summary
Spellcheck Dictionary
.cspell.dict/cpython.txt
Added tokens saveall, uncollectable.
GC Module (stdlib)
crates/stdlib/src/gc.rs
New public gc module API: debug flags, enable/disable, collect, thresholds/counts/stats, object introspection, freeze/unfreeze, callbacks invocation, and argument structs.
VM Feature & Exposure
crates/vm/Cargo.toml, crates/vm/src/lib.rs
Added gc feature and made it default; exported pub mod gc_state.
GC Core State
crates/vm/src/gc_state.rs
Introduced GcState, GcGeneration, GcStats, debug flags, tracking registries, freeze/unfreeze, threshold/count/stats accessors, collection stubs, and global singleton accessor.
Object Model Changes
crates/vm/src/object/core.rs
Made gc_finalized public; added is_gc_tracked() and gc_get_referents() on PyObject for tracking and referent enumeration.
VM Context
crates/vm/src/vm/context.rs
Added gc_callbacks: PyListRef and gc_garbage: PyListRef to Context and initialized them at genesis.

Sequence Diagram

sequenceDiagram
    actor Python
    participant GC as "gc module"
    participant GCState as "GcState (global)"
    participant Callbacks as "callbacks list"
    participant VM as "VM Context"

    Python->>GC: collect(generation=None)
    GC->>GCState: collect(generation)
    activate GCState
    GCState->>Callbacks: invoke_callbacks("start", generation, 0, 0)
    Callbacks->>VM: execute registered callbacks
    VM-->>Callbacks: callback results
    GCState->>GCState: perform collection (counts)
    GCState->>Callbacks: invoke_callbacks("stop", generation, collected, uncollectable)
    Callbacks->>VM: execute registered callbacks
    VM-->>Callbacks: callback results
    deactivate GCState
    GCState-->>GC: return (collected, uncollectable)
    GC-->>Python: return counts
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • impl gc finialized #6689: Related changes to PyObject finalization and gc_bits handling which intersects with making gc_finalized public and GC tracking logic.

Suggested reviewers

  • ShaharNaveh

Poem

🐰 I nibble code where cycles hide,
I count three generations wide,
I hop through lists of callbacks bright,
I freeze, unfreeze in moonlit night,
Hooray — the GC burrows right!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'gc module internal structure and API' accurately describes the main purpose of this pull request, which adds the internal GC state structure and comprehensive API surface for the gc module.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jan 30, 2026

Copy link
Copy Markdown
Contributor

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin gc-internal

@youknowone

Copy link
Copy Markdown
Member Author

@youknowone youknowone force-pushed the gc-internal branch 3 times, most recently from 8052f5f to 5a8b225 Compare January 31, 2026 08:58
Add gc_state module with GcState, GcGeneration, GcDebugFlags, GcStats.
Replace gc module stubs with working API backed by gc_state.
Add gc_callbacks and gc_garbage to Context.
Add is_gc_tracked, gc_finalized, gc_get_referents to PyObject.
Collection is stubbed (returns 0) — actual algorithm to follow.
@youknowone youknowone marked this pull request as ready for review January 31, 2026 13:12

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@crates/stdlib/src/gc.rs`:
- Around line 189-196: get_referrers currently unconditionally returns an empty
list but the review warns against raising NotImplementedError; to match CPython
behavior, either implement basic GC-tracking or explicitly preserve the
empty-list semantics and document the limitation. Update the
get_referrers(FuncArgs, VirtualMachine) implementation to return
vm.ctx.new_list(vec![]) for non-discoverable referrers, add/adjust the doc
comment inside the function to state that only GC-tracked references are
discoverable and stack/C-extension references are not, and do not raise
NotImplementedError; alternatively, if you choose to implement minimal tracking,
modify the scanning logic in get_referrers to consult your GC-tracking
structures and collect referrers before falling back to vm.ctx.new_list(vec![]).

In `@crates/vm/src/gc_state.rs`:
- Around line 231-287: untrack_object currently never removes a deallocated
frozen object from permanent_objects, leaving stale pointers that break unfreeze
and get_freeze_count; update unsafe fn untrack_object to also acquire a write
lock on self.permanent_objects and remove the GcObjectPtr (gc_ptr) just like
tracked_objects and finalized_objects are handled, ensuring the permanent set
does not retain freed pointers (refer to GcObjectPtr, untrack_object,
permanent_objects, unfreeze, and get_freeze_count).

In `@crates/vm/src/object/core.rs`:
- Around line 902-919: is_gc_tracked currently returns capability
(vtable.trace.is_some() || dict.is_some()) instead of actual membership; add a
public query on GcState (e.g., GcState::is_tracked(&self, obj_id: ObjectId) or
similar) that checks the tracked_objects set used by
track_object/untrack_object, then change PyObject::is_gc_tracked to call that
GcState query rather than inspecting vtable.trace or dict; ensure you use the
object's unique identifier or reference the same key used in tracked_objects so
is_gc_tracked reflects real GC membership while gc_get_referents can remain
unchanged.
🧹 Nitpick comments (1)
crates/vm/src/gc_state.rs (1)

289-305: Finalize tracking has two sources of truth.

GcState::{is_finalized, mark_finalized} uses a HashSet, but PyObject::gc_finalized relies on per-object bits and stdlib gc.is_finalized uses that path. Consider consolidating on one mechanism to avoid divergence (e.g., expose set_gc_finalized as pub(crate) and set bits here, or drop the set).

@ShaharNaveh ShaharNaveh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

lgtm!

@github-actions

Copy link
Copy Markdown
Contributor

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[ ] lib: cpython/Lib/asyncio
[ ] test: cpython/Lib/test/test_asyncio (TODO: 14)

dependencies:

  • asyncio (native: _overlapped, _pyrepl.console, _remote_debugging, _winapi, asyncio.tools, collections.abc, concurrent.futures, errno, itertools, math, msvcrt, sys, time)
    • ast (native: _ast, sys)
    • collections (native: _weakref, itertools, sys)
    • dataclasses (native: itertools, sys)
    • enum (native: builtins, sys)
    • io (native: _io, _thread, errno, sys)
    • logging (native: atexit, errno, logging.handlers, sys, time)
    • os (native: os.path, sys)
    • site (native: _io, builtins, errno, sys)
    • socket (native: _socket, sys)
    • subprocess (native: builtins, errno, sys, time)
    • tempfile (native: _thread, errno, sys)
    • traceback (native: collections.abc, itertools, sys)
    • _colorize, argparse, contextlib, contextvars, functools, heapq, inspect, linecache, reprlib, selectors, signal, stat, struct, threading, types, warnings, weakref

dependent tests: (8 tests)

  • asyncio: test_asyncio test_builtin test_contextlib_async test_inspect test_logging test_os test_sys_settrace test_unittest

[ ] test: cpython/Lib/test/test_dict.py (TODO: 3)
[ ] test: cpython/Lib/test/test_dictcomps.py (TODO: 1)
[ ] test: cpython/Lib/test/test_dictviews.py (TODO: 2)
[ ] test: cpython/Lib/test/test_userdict.py

dependencies:

dependent tests: (no tests depend on dict)

[ ] test: cpython/Lib/test/test_generators.py (TODO: 5)
[ ] test: cpython/Lib/test/test_genexps.py
[ ] test: cpython/Lib/test/test_generator_stop.py
[ ] test: cpython/Lib/test/test_yield_from.py (TODO: 10)

dependencies:

dependent tests: (no tests depend on generator)

[ ] lib: cpython/Lib/subprocess.py
[ ] test: cpython/Lib/test/test_subprocess.py (TODO: 9)

dependencies:

  • subprocess

dependent tests: (51 tests)

  • subprocess: test_android test_asyncio test_audit test_bz2 test_c_locale_coercion test_cmd_line test_cmd_line_script test_ctypes test_dtrace test_faulthandler test_file_eintr test_gzip test_inspect test_json test_msvcrt test_ntpath test_os test_platform test_plistlib test_poll test_py_compile test_regrtest test_repl test_runpy test_script_helper test_shutil test_signal test_site test_sqlite3 test_subprocess test_support test_sys test_sysconfig test_tempfile test_threading test_unittest test_urllib2 test_utf8_mode test_venv test_wait3 test_webbrowser test_zipfile
    • asyncio: test_asyncio test_builtin test_contextlib_async test_logging test_sys_settrace test_unittest
    • ctypes.util: test_ctypes
    • ensurepip: test_ensurepip
    • multiprocessing.util: test_concurrent_futures

[x] lib: cpython/Lib/weakref.py
[x] lib: cpython/Lib/_weakrefset.py
[x] test: cpython/Lib/test/test_weakref.py (TODO: 23)

dependencies:

  • weakref

dependent tests: (146 tests)

  • weakref: test_array test_ast test_asyncio test_code test_concurrent_futures test_context test_contextlib test_copy test_deque test_descr test_dict test_exceptions test_file test_fileio test_functools test_generators test_genericalias test_importlib test_inspect test_io test_ipaddress test_itertools test_logging test_memoryview test_mmap test_ordered_dict test_pickle test_picklebuffer test_queue test_re test_scope test_set test_slice test_socket test_sqlite3 test_ssl test_struct test_tempfile test_thread test_threading test_threading_local test_types test_typing test_unittest test_uuid test_weakref test_weakset test_xml_etree
    • asyncio: test_asyncio test_builtin test_contextlib_async test_os test_sys_settrace test_unittest
    • concurrent.futures.process: test_concurrent_futures
    • copy: test_bytes test_codecs test_collections test_csv test_defaultdict test_dictviews test_email test_enum test_fractions test_http_cookies test_opcache test_optparse test_platform test_plistlib test_posix test_site test_statistics test_sysconfig test_tomllib test_xml_dom_minicompat test_zlib
      • dataclasses: test__colorize test_patma test_pprint test_regrtest test_zoneinfo
      • email.generator: test_email
      • http.cookiejar: test_http_cookiejar
      • http.server: test_httpservers test_robotparser test_urllib2_localnet
      • mailbox: test_mailbox
      • smtplib: test_smtplib test_smtpnet
      • tarfile: test_shutil test_tarfile
    • inspect: test_abc test_argparse test_asyncgen test_decimal test_grammar test_ntpath test_operator test_posixpath test_signal test_traceback test_yield_from test_zipimport
      • bdb: test_bdb
      • importlib.metadata: test_importlib
      • rlcompleter: test_rlcompleter
      • trace: test_trace
      • xmlrpc.server: test_docxmlrpc test_xmlrpc
    • multiprocessing: test_concurrent_futures test_fcntl test_multiprocessing_main_handling
    • symtable: test_symtable
    • tempfile: test_bz2 test_cmd_line test_compile test_compileall test_dis test_doctest test_ensurepip test_faulthandler test_filecmp test_fileinput test_importlib test_linecache test_pathlib test_pkg test_pkgutil test_py_compile test_runpy test_selectors test_string_literals test_subprocess test_support test_tabnanny test_threadedtempfile test_urllib test_urllib2 test_urllib_response test_venv test_zipapp test_zipfile test_zipfile64 test_zstd
      • urllib.request: test_urllib2net test_urllibnet

[ ] test: cpython/Lib/test/test_weakset.py (TODO: 1)

dependencies:

dependent tests: (no tests depend on weakset)

[ ] lib: cpython/Lib/zoneinfo
[ ] test: cpython/Lib/test/test_zoneinfo (TODO: 3)

dependencies:

  • zoneinfo
    • datetime (native: _thread, math, sys, time)
    • sysconfig (native: os.path, sys)
    • collections, os
    • bisect, calendar, functools, re, struct, weakref

dependent tests: (3 tests)

  • zoneinfo: test_email test_plistlib test_zoneinfo

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@crates/vm/src/gc_state.rs`:
- Around line 236-252: In track_object, a poisoned write lock on
generation_objects[0] can cause the object to be inserted into tracked_objects
but not generation_objects (and counts not incremented), creating inconsistency
with get_objects or generation counts; fix by making the operation atomic:
either return a Result from track_object (propagate the lock error from
generation_objects[0].write()) or acquire/check both locks first and only insert
into tracked_objects if generation_objects[0].write() succeeded (i.e., perform
the generation_objects[0].write() and insert into gen0 and update
generations[0].count/alloc_count before touching tracked_objects), referencing
track_object, generation_objects, tracked_objects, generations, alloc_count and
get_objects so the change is applied where those symbols are used.
- Around line 391-429: The freeze() and unfreeze() functions drain sets into
local vectors then attempt to write the destination lock, which can fail and
silently drop objects; change them to handle lock errors defensively: after
draining a generation (generation_objects.write() in freeze) or draining
permanent (permanent_objects.write() in unfreeze), check the result of acquiring
the destination lock (permanent_objects.write() in freeze,
generation_objects[2].write() in unfreeze) and if it Err, re-acquire the
original source lock (or re-insert the drained pointers back into the original
generation/permanent) or else log a fatal error/panic so objects are not lost;
reference freeze, unfreeze, generation_objects, permanent_objects,
permanent.count, generations[*].count and ensure counts are only updated after
successful insertion into the destination set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants