◐ Shell
reader mode source ↗
Skip to content

Align marshal and .pyc with CPython 3.14#7958

Merged
youknowone merged 10 commits into
RustPython:mainfrom
youknowone:marshal
May 24, 2026
Merged

Align marshal and .pyc with CPython 3.14#7958
youknowone merged 10 commits into
RustPython:mainfrom
youknowone:marshal

Conversation

@youknowone

@youknowone youknowone commented May 23, 2026

Copy link
Copy Markdown
Member

Summary by CodeRabbit

  • Bug Fixes

    • Fixed bytecode deserialization to correctly share and resolve references across nested bytecode objects.
  • Chores

    • Updated bytecode version constants to version 4.
    • Modified system implementation cache tag format to use standardized naming.
  • Tests

    • Added comprehensive test coverage for CPython 3.14 bytecode marshaling, including reference resolution scenarios.

Review Change Stack

read_marshal_bytes, _str, _str_vec, _name_tuple, and _const_tuple now
take a shared ref table and resolve TYPE_REF / register FLAG_REF
entries. deserialize_code is split into a public wrapper and an inner
function that receives the ref table; deserialize_value_depth opens a
fresh inner ref space when it hits Type::Code, mirroring CPython's
behaviour of putting the code object itself at ref slot 0. Nested code
objects inside const tuples reuse the surrounding code's ref space via
the new read_const_value helper.
… 3.14

PYC_MAGIC_NUMBER changes from 2994 to 3627, matching CPython 3.14's
pyc_magic_number_token (0x0a0d0e2b). marshal FORMAT_VERSION drops from
5 to 4 (the encoder/marshal.version value; the decoder already accepts
both). check_pyc_magic_number_bytes now compares all four magic bytes
instead of the first two.
Two fixture-based tests pin the marshal decoder against actual CPython
3.14 marshal.dumps() output: a trivial module that exercises FLAG_REF
plus TYPE_REF for qualname, and a module with a nested function that
exercises ref sharing between a const tuple and its surrounding code
object.
@coderabbitai

coderabbitai Bot commented May 23, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 7725bd37-7553-416e-af9d-13e12af804f7

📥 Commits

Reviewing files that changed from the base of the PR and between b0b8ede and cd63341.

📒 Files selected for processing (2)
  • crates/compiler-core/src/marshal.rs
  • crates/vm/src/stdlib/sys.rs

📝 Walkthrough

Walkthrough

This PR upgrades RustPython to support CPython 3.14 marshal format by updating the PYC magic number constant to match CPython 3.14 (3627), reducing the marshal FORMAT_VERSION to 4, and refactoring code-object deserialization to correctly share and resolve a ref-table across nested marshal values using TYPE_REF and FLAG_REF flags. Version-dependent checks in import and sys modules are updated accordingly.

Changes

CPython 3.14 Marshal Codec Upgrade

Layer / File(s) Summary
CPython 3.14 version constants
crates/vm/src/version.rs
PYC_MAGIC_NUMBER updated to 3627, with derived constants (PYC_MAGIC_NUMBER_TOKEN and PYC_MAGIC_NUMBER_BYTES) automatically reflecting the new value.
Code-object deserialization with shared ref-table
crates/compiler-core/src/marshal.rs
FORMAT_VERSION reduced to 4. deserialize_code initializes a mutable ref-table and delegates to deserialize_code_inner. Code-object fields (bytecode, constants, names, linetable, exceptiontable) are decoded via ref-aware helpers that interpret TYPE_REF and register values via FLAG_REF. Value deserialization is reorganized through deserialize_value_after_header to construct inner ref-tables for nested code objects. Dict decoding updated to use ref-aware reading. Type::Code removed from deserialize_value_typed.
Marshal deserialization tests for CPython 3.14
crates/compiler-core/src/marshal.rs
Unit tests added for CPython 3.14 marshal output, including FLAG_REF on code objects, TYPE_REF resolution for qualname, and verification that nested code objects within const tuples share the surrounding code's ref-table space.
Version-dependent compatibility checks
crates/vm/src/import.rs, crates/vm/src/stdlib/sys.rs
Magic-number prefix check in import.rs updated to pass PYC_MAGIC_NUMBER_BYTES directly. sys.implementation.cache_tag format changed to cpython-{MAJOR}{MINOR}.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • RustPython/RustPython#7588: Adjusts VM marshal and import call sites to pass the updated PyVmBag and construct PyCode correctly, complementing the deserialization refactoring in the same marshal code-object pipeline.

Poem

🐰 Ref-tables dance through bytecode streams,
3.14 arrives in marshal dreams—
Nested codes now share their space,
FLAG_REF brings order, TYPE_REF its grace!
📚✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Align marshal and .pyc with CPython 3.14' directly describes the primary objective of the PR, which updates RustPython's marshal and .pyc handling to align with CPython 3.14 across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

SourceFileLoader.get_code now also looks for .pyc files using
_RP_FALLBACK_CACHE_TAGS (currently ('cpython-314',)) in addition to
sys.implementation.cache_tag. The matched .pyc is only used for
reading; recompilation still writes to the RustPython-tagged path, so
CPython's .pyc is never overwritten. Source-stat / hash / timestamp
validation logic is unchanged.
CPython's marshal supports TYPE_SLICE from format version 4 onwards
and that is the default version. Rejecting slice dumps below version
5 made marshal.dumps(slice(...)) fail with the default version and
broke test.test_marshal.SliceTestCase.test_slice.
Lib/importlib/_bootstrap_external.py is CPython's own code copied
verbatim; local patches here defeat compatibility tracking. The
cpython-XX cache_tag fallback needs to live on the RustPython side
(Rust code or sys.implementation.cache_tag policy), not as edits to
the imported standard library.

This reverts commit 1fc426d0fb5fcdb50d35cad13bbb43e8f6ce1c7f.
@youknowone youknowone marked this pull request as ready for review May 24, 2026 04:43
@youknowone youknowone changed the title Align marshal May 24, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/compiler-core/src/marshal.rs (1)

825-845: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Dict keys now reject code objects.

This path still decodes keys by calling deserialize_value_typed(...) directly. After Line 857, Type::Code is only handled in deserialize_value_depth, so a dict key that marshals as a code object now fails with BadType even though serialize_value can still emit it.

Route pre-read dict keys through the same raw-header/code-object path as deserialize_value_depth instead of maintaining a second decoder here.

As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."

Also applies to: 857-857

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/compiler-core/src/marshal.rs` around lines 825 - 845, The dict-key
branch duplicates decoding logic and calls deserialize_value_typed(...)
directly, which bypasses the raw-header/code-object handling in
deserialize_value_depth and causes Type::Code keys to be rejected; change the
key decoding in this branch to first determine the key's type_code/header (using
the same raw/FLAG_REF/ref-slot handling as shown) then invoke the common decoder
path used by deserialize_value_depth (so code objects go through the same
raw-header/code-object route), i.e., extract the differing value (the key_type
or ref index) and call the shared decode routine rather than duplicating
deserialize_value_typed logic; update use of refs, key_slot, and k to follow the
same flow as deserialize_value_depth so Type::Code is accepted for dict keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/compiler-core/src/marshal.rs`:
- Around line 472-474: The tuple deserialization resets the recursion budget and
allows nested code->const tuple->code chains to bypass limits; change
read_marshal_const_tuple (and similar helpers used in lines ~487-516) to accept
the current depth parameter and call read_const_value(rdr, bag, depth - 1, refs)
for each element instead of using MAX_MARSHAL_STACK_DEPTH, and ensure
read_const_value passes depth - 1 when it recurses into deserialize_code_inner
so the depth budget is threaded through and decremented across code-object
helper boundaries (references: read_marshal_const_tuple, read_const_value,
deserialize_code_inner, MAX_MARSHAL_STACK_DEPTH).
- Around line 704-715: The current Type::Code branch in deserialize_value_depth
creates a fresh inner_refs Vec (and only reserves slot 0 when FLAG_REF is set)
which resets the TYPE_REF index space and prevents code-object fields from
resolving refs from the outer stream; change this to reuse the caller's ref memo
or maintain an index-aligned mapping so TYPE_REF indices remain consistent:
instead of creating a new inner_refs Vec in the Type::Code arm, pass the
existing ref memo (or construct a mapping that pre-populates entries to preserve
index alignment) into deserialize_code_inner, ensuring deserialize_code_inner
and bag.make_code operate against the same ref index space as the surrounding
marshal stream (referencing deserialize_value_depth, inner_refs,
deserialize_code_inner, Bag::ConstantBag::Constant, TYPE_REF, FLAG_REF, and
bag.make_code).

In `@crates/vm/src/stdlib/sys.rs`:
- Around line 661-662: The cache tag change makes RustPython emit the same
__pycache__ filenames as CPython, risking cross-interpreter .pyc conflicts;
revert the tag to a RustPython-specific prefix (e.g., change the cache_tag
creation from "cpython-{}{}" to "rustpython-{}{}" using version::MAJOR and
version::MINOR) and/or add defensive validation in the .pyc loader to detect and
refuse incompatible marshalled bytecode (so when reading .pyc files you check
the tag/magic and fail-safe rather than attempting to execute another
interpreter's bytecode).

In `@crates/vm/src/version.rs`:
- Around line 72-73: Update the PYC_MAGIC_NUMBER constant from 3627 to 3658 to
match CPython 3.14: change the value of PYC_MAGIC_NUMBER in version.rs and
update the accompanying comment to reflect CPython 3.14; then run and adjust any
unit tests or logic that assert or compute against PYC_MAGIC_NUMBER (search for
references to PYC_MAGIC_NUMBER, pyc magic, or functions using that constant) so
they expect 3658 and any derived behavior remains correct.

---

Outside diff comments:
In `@crates/compiler-core/src/marshal.rs`:
- Around line 825-845: The dict-key branch duplicates decoding logic and calls
deserialize_value_typed(...) directly, which bypasses the raw-header/code-object
handling in deserialize_value_depth and causes Type::Code keys to be rejected;
change the key decoding in this branch to first determine the key's
type_code/header (using the same raw/FLAG_REF/ref-slot handling as shown) then
invoke the common decoder path used by deserialize_value_depth (so code objects
go through the same raw-header/code-object route), i.e., extract the differing
value (the key_type or ref index) and call the shared decode routine rather than
duplicating deserialize_value_typed logic; update use of refs, key_slot, and k
to follow the same flow as deserialize_value_depth so Type::Code is accepted for
dict keys.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: d31894f2-0beb-4988-9b37-3783ed0c14e9

📥 Commits

Reviewing files that changed from the base of the PR and between d3272e7 and b0b8ede.

📒 Files selected for processing (5)
  • crates/compiler-core/src/marshal.rs
  • crates/vm/src/import.rs
  • crates/vm/src/stdlib/marshal.rs
  • crates/vm/src/stdlib/sys.rs
  • crates/vm/src/version.rs

@youknowone

Copy link
Copy Markdown
Member Author

Bidirectional .pyc interop evidence

Local CPython 3.14.5 (~/Projects/cpython @ 5607950ef23, "Python 3.14.5") confirms:

  • PYC_MAGIC_NUMBER = 3627 (Include/internal/pycore_magic_number.h:295) — already what we ship
  • Py_MARSHAL_VERSION = 5 (Include/marshal.h:16) — restored FORMAT_VERSION from 4 → 5 to match
  • TYPE_SLICE requires version >= 5 (Python/marshal.c:720) — reverted the slice gate to < 5

Metadata after fix:

marshal.version = 5
cache_tag       = cpython-314
magic token     = 0xa0d0e2b   (== CPython 3.14.5)
importlib magic = 2b0e0d0a    (int 3627, == CPython)

Demo source

"""Demo module to verify .pyc interop between CPython 3.14 and RustPython."""

GREETING = "Hello from {who}"
PRIMES = [2, 3, 5, 7, 11, 13]

def greet(who: str) -> str:
    return GREETING.format(who=who)

class Counter:
    def __init__(self, start: int = 0) -> None:
        self.value = start

    def tick(self, by: int = 1) -> int:
        self.value += by
        return self.value

def slice_demo():
    s = slice(1, 5, 2)
    return PRIMES[s]

def main():
    print(greet("RustPython"))
    print(greet("CPython"))
    c = Counter(10)
    for _ in range(3):
        c.tick(2)
    print("counter:", c.value)
    print("sliced :", slice_demo())

Headers are byte-identical

CPython pyc header:   2b0e0d0a 00000000 9086126a b1020000
RustPython pyc header: 2b0e0d0a 00000000 9086126a b1020000
Same magic+flags(4b): True

A) RustPython runs CPython-produced .pyc

$ cp demo.cpython314.pyc demo.pyc   # produced by `python3.14 -m py_compile`
$ ./target/release/rustpython -c "import sys; sys.path.insert(0, '.'); import demo; demo.main(); print('loaded:', demo.__spec__.origin)"
Hello from RustPython
Hello from CPython
counter: 16
sliced : [3, 7]
loaded: /private/tmp/pyc_evidence/demo.pyc

B) CPython runs RustPython-produced .pyc

$ cp demo.rustpython.pyc demo.pyc   # produced by `rustpython -m py_compile`
$ python3.14 -c "import sys; sys.path.insert(0, '.'); import demo; demo.main(); print('loaded:', demo.__spec__.origin)"
Hello from RustPython
Hello from CPython
counter: 16
sliced : [3, 7]
loaded: /private/tmp/pyc_evidence/demo.pyc

C) Marshal-level: CPython dumps → RustPython loads → exec

$ python3.14 -c "import marshal; src=compile('def f(x):\n    return [x*p for p in [2,3,5]]\nprint(f(7))','<cp>','exec'); open('cp.marshal','wb').write(marshal.dumps(src))"
$ ./target/release/rustpython -c "import marshal; exec(marshal.load(open('cp.marshal','rb')))"
[14, 21, 35]

D) Marshal-level: RustPython dumps → CPython loads → exec

$ ./target/release/rustpython -c "import marshal; src=compile('class P:\n    def __init__(self,x): self.x=x\n    def double(self): return self.x*2\nprint(P(21).double())','<rp>','exec'); open('rp.marshal','wb').write(marshal.dumps(src))"
$ python3.14 -c "import marshal; exec(marshal.load(open('rp.marshal','rb')))"
42

The demo exercises slices, classes, comprehensions, default args, type annotations, and module-level constants — all round-trip through marshal/.pyc in both directions.

commented by Claude

Use the CPython compatibility version (e.g. cpython-314) instead of
the rustpython-{MAJOR_IMPL}_{MINOR_IMPL} interpreter version string.
Py_MARSHAL_VERSION is 5 in CPython 3.14.5 (Include/marshal.h:16) and
TYPE_SLICE serialization rejects version < 5 (Python/marshal.c:720).
Restore the same threshold and constant so marshal.version and the
slice-marshal gate match CPython.
Code objects embedded in const-tuples reset the depth budget on each
recursion, so a hostile or pathological marshal stream of code-in-tuple-
in-code can blow the stack despite MAX_MARSHAL_STACK_DEPTH. Pass the
current depth through deserialize_code_inner and read_marshal_const_tuple
and decrement at each code-object/tuple boundary.

Also route dict keys through deserialize_value_after_header so TYPE_CODE
keys decode instead of failing with BadType.
Hide details View details @youknowone youknowone merged commit b5ff41c into RustPython:main May 24, 2026
26 checks passed
@youknowone youknowone deleted the marshal branch May 24, 2026 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant