Align marshal and .pyc with CPython 3.14#7958
Conversation
read_marshal_bytes, _str, _str_vec, _name_tuple, and _const_tuple now take a shared ref table and resolve TYPE_REF / register FLAG_REF entries. deserialize_code is split into a public wrapper and an inner function that receives the ref table; deserialize_value_depth opens a fresh inner ref space when it hits Type::Code, mirroring CPython's behaviour of putting the code object itself at ref slot 0. Nested code objects inside const tuples reuse the surrounding code's ref space via the new read_const_value helper.
… 3.14 PYC_MAGIC_NUMBER changes from 2994 to 3627, matching CPython 3.14's pyc_magic_number_token (0x0a0d0e2b). marshal FORMAT_VERSION drops from 5 to 4 (the encoder/marshal.version value; the decoder already accepts both). check_pyc_magic_number_bytes now compares all four magic bytes instead of the first two.
Two fixture-based tests pin the marshal decoder against actual CPython 3.14 marshal.dumps() output: a trivial module that exercises FLAG_REF plus TYPE_REF for qualname, and a module with a nested function that exercises ref sharing between a const tuple and its surrounding code object.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR upgrades RustPython to support CPython 3.14 marshal format by updating the PYC magic number constant to match CPython 3.14 (3627), reducing the marshal FORMAT_VERSION to 4, and refactoring code-object deserialization to correctly share and resolve a ref-table across nested marshal values using TYPE_REF and FLAG_REF flags. Version-dependent checks in import and sys modules are updated accordingly. ChangesCPython 3.14 Marshal Codec Upgrade
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Sorry, something went wrong.
SourceFileLoader.get_code now also looks for .pyc files using
_RP_FALLBACK_CACHE_TAGS (currently ('cpython-314',)) in addition to
sys.implementation.cache_tag. The matched .pyc is only used for
reading; recompilation still writes to the RustPython-tagged path, so
CPython's .pyc is never overwritten. Source-stat / hash / timestamp
validation logic is unchanged.
CPython's marshal supports TYPE_SLICE from format version 4 onwards and that is the default version. Rejecting slice dumps below version 5 made marshal.dumps(slice(...)) fail with the default version and broke test.test_marshal.SliceTestCase.test_slice.
Lib/importlib/_bootstrap_external.py is CPython's own code copied verbatim; local patches here defeat compatibility tracking. The cpython-XX cache_tag fallback needs to live on the RustPython side (Rust code or sys.implementation.cache_tag policy), not as edits to the imported standard library. This reverts commit 1fc426d0fb5fcdb50d35cad13bbb43e8f6ce1c7f.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/compiler-core/src/marshal.rs (1)
825-845: ⚠️ Potential issue | 🟠 Major | ⚡ Quick winDict keys now reject code objects.
This path still decodes keys by calling
deserialize_value_typed(...)directly. After Line 857,Type::Codeis only handled indeserialize_value_depth, so a dict key that marshals as a code object now fails withBadTypeeven thoughserialize_valuecan still emit it.Route pre-read dict keys through the same raw-header/code-object path as
deserialize_value_depthinstead of maintaining a second decoder here.As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."
Also applies to: 857-857
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/compiler-core/src/marshal.rs` around lines 825 - 845, The dict-key branch duplicates decoding logic and calls deserialize_value_typed(...) directly, which bypasses the raw-header/code-object handling in deserialize_value_depth and causes Type::Code keys to be rejected; change the key decoding in this branch to first determine the key's type_code/header (using the same raw/FLAG_REF/ref-slot handling as shown) then invoke the common decoder path used by deserialize_value_depth (so code objects go through the same raw-header/code-object route), i.e., extract the differing value (the key_type or ref index) and call the shared decode routine rather than duplicating deserialize_value_typed logic; update use of refs, key_slot, and k to follow the same flow as deserialize_value_depth so Type::Code is accepted for dict keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/compiler-core/src/marshal.rs`:
- Around line 472-474: The tuple deserialization resets the recursion budget and
allows nested code->const tuple->code chains to bypass limits; change
read_marshal_const_tuple (and similar helpers used in lines ~487-516) to accept
the current depth parameter and call read_const_value(rdr, bag, depth - 1, refs)
for each element instead of using MAX_MARSHAL_STACK_DEPTH, and ensure
read_const_value passes depth - 1 when it recurses into deserialize_code_inner
so the depth budget is threaded through and decremented across code-object
helper boundaries (references: read_marshal_const_tuple, read_const_value,
deserialize_code_inner, MAX_MARSHAL_STACK_DEPTH).
- Around line 704-715: The current Type::Code branch in deserialize_value_depth
creates a fresh inner_refs Vec (and only reserves slot 0 when FLAG_REF is set)
which resets the TYPE_REF index space and prevents code-object fields from
resolving refs from the outer stream; change this to reuse the caller's ref memo
or maintain an index-aligned mapping so TYPE_REF indices remain consistent:
instead of creating a new inner_refs Vec in the Type::Code arm, pass the
existing ref memo (or construct a mapping that pre-populates entries to preserve
index alignment) into deserialize_code_inner, ensuring deserialize_code_inner
and bag.make_code operate against the same ref index space as the surrounding
marshal stream (referencing deserialize_value_depth, inner_refs,
deserialize_code_inner, Bag::ConstantBag::Constant, TYPE_REF, FLAG_REF, and
bag.make_code).
In `@crates/vm/src/stdlib/sys.rs`:
- Around line 661-662: The cache tag change makes RustPython emit the same
__pycache__ filenames as CPython, risking cross-interpreter .pyc conflicts;
revert the tag to a RustPython-specific prefix (e.g., change the cache_tag
creation from "cpython-{}{}" to "rustpython-{}{}" using version::MAJOR and
version::MINOR) and/or add defensive validation in the .pyc loader to detect and
refuse incompatible marshalled bytecode (so when reading .pyc files you check
the tag/magic and fail-safe rather than attempting to execute another
interpreter's bytecode).
In `@crates/vm/src/version.rs`:
- Around line 72-73: Update the PYC_MAGIC_NUMBER constant from 3627 to 3658 to
match CPython 3.14: change the value of PYC_MAGIC_NUMBER in version.rs and
update the accompanying comment to reflect CPython 3.14; then run and adjust any
unit tests or logic that assert or compute against PYC_MAGIC_NUMBER (search for
references to PYC_MAGIC_NUMBER, pyc magic, or functions using that constant) so
they expect 3658 and any derived behavior remains correct.
---
Outside diff comments:
In `@crates/compiler-core/src/marshal.rs`:
- Around line 825-845: The dict-key branch duplicates decoding logic and calls
deserialize_value_typed(...) directly, which bypasses the raw-header/code-object
handling in deserialize_value_depth and causes Type::Code keys to be rejected;
change the key decoding in this branch to first determine the key's
type_code/header (using the same raw/FLAG_REF/ref-slot handling as shown) then
invoke the common decoder path used by deserialize_value_depth (so code objects
go through the same raw-header/code-object route), i.e., extract the differing
value (the key_type or ref index) and call the shared decode routine rather than
duplicating deserialize_value_typed logic; update use of refs, key_slot, and k
to follow the same flow as deserialize_value_depth so Type::Code is accepted for
dict keys.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: d31894f2-0beb-4988-9b37-3783ed0c14e9
📒 Files selected for processing (5)
crates/compiler-core/src/marshal.rscrates/vm/src/import.rscrates/vm/src/stdlib/marshal.rscrates/vm/src/stdlib/sys.rscrates/vm/src/version.rs
Sorry, something went wrong.
Bidirectional
|
Sorry, something went wrong.
Use the CPython compatibility version (e.g. cpython-314) instead of
the rustpython-{MAJOR_IMPL}_{MINOR_IMPL} interpreter version string.
Py_MARSHAL_VERSION is 5 in CPython 3.14.5 (Include/marshal.h:16) and TYPE_SLICE serialization rejects version < 5 (Python/marshal.c:720). Restore the same threshold and constant so marshal.version and the slice-marshal gate match CPython.
Code objects embedded in const-tuples reset the depth budget on each recursion, so a hostile or pathological marshal stream of code-in-tuple- in-code can blow the stack despite MAX_MARSHAL_STACK_DEPTH. Pass the current depth through deserialize_code_inner and read_marshal_const_tuple and decrement at each code-object/tuple boundary. Also route dict keys through deserialize_value_after_header so TYPE_CODE keys decode instead of failing with BadType.
b5ff41c
into
RustPython:main
May 24, 2026
Summary by CodeRabbit
Bug Fixes
Chores
Tests