marshal#7467
Conversation
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] test: cpython/Lib/test/test_marshal.py (TODO: 7) dependencies: dependent tests: (21 tests)
Legend:
|
Sorry, something went wrong.
📝 WalkthroughWalkthroughRefactors varint/exception-table encoding, rewrites marshal serialization/deserialization with depth/ref tracking and CPython field ordering, reworks oparg enums (MakeFunctionFlag, ComparisonOperator), adds CodeUnits specialization-disable, and small VM API/trait adjustments. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Call site
participant VM as VM stdlib marshal
participant Core as compiler-core marshal
participant File as file-like object
Caller->>VM: marshal.dumps/dump(args: allow_code, depth)
VM->>Core: serialize_value / write_object (tracks refs, enforces depth)
Core-->>VM: marshaled bytes (may include FLAG_REF)
VM->>File: write(bytes) / read(bytes) for load
File-->>VM: bytes read (positioned after consumed bytes)
VM->>Core: deserialize_value_depth (uses FLAG_REF, MAX_MARSHAL_STACK_DEPTH)
Core-->>VM: PyObjectRef
VM-->>Caller: return object or bytes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Sorry, something went wrong.
170d8fe to
1a0c0ce
Compare
March 20, 2026 14:38
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 570-576: The current disable_specialization() only writes
UNREACHABLE_BACKOFF into adaptive_counters but CodeUnits::quicken() later
unconditionally reinitializes cacheable sites, undoing the disable; update the
implementation so that quicken() respects the non-specializable state: either
add a persistent flag on the code object (e.g., non_specializable) set by
disable_specialization() and check that flag in CodeUnits::quicken() before
initializing warmup counters at RESUME, or have quicken() inspect each adaptive
counter and skip reinitialization when the counter already equals
UNREACHABLE_BACKOFF; ensure you reference disable_specialization,
adaptive_counters, UNREACHABLE_BACKOFF, and CodeUnits::quicken() in your changes
so the disable persists across quicken().
In `@crates/compiler-core/src/bytecode/oparg.rs`:
- Around line 385-389: The current encoding mixes bit positions and CPython
power-of-two bitmasks, causing ambiguity in TryFrom<u32> and in
From<MakeFunctionFlag> for u32; change the conversions so TryFrom<u32> decodes
CPython-style masks (treat input as bitmask and match 1,2,4,8,16,32
specifically, returning Err for non-power-of-two or unknown values) and make
From<MakeFunctionFlag> for u32 emit the CPython bitmask (1 << position) rather
than the raw position; if you need to support position-based values keep or add
a distinct TryFrom<u8> (or a helper like from_position) to decode stored
positions, and adjust any uses of the bitflag! macro or internal storage to
consistently treat the enum as representing a position with conversions handling
mask<->position via (1 << position). Ensure unique symbols updated: TryFrom<u32>
impl for MakeFunctionFlag, From<MakeFunctionFlag> for u32, and any helper
from_position/from_mask functions.
In `@crates/compiler-core/src/marshal.rs`:
- Around line 236-254: The current code reconstructs localspluskinds using
split_localplus(...) and build_localspluskinds_from_split(...), which strips the
CO_FAST_HIDDEN bit; instead preserve hidden flags by merging them back from the
original localspluskinds: after computing lp via split_localplus and before
assigning localspluskinds for the CodeObject, iterate the original
localspluskinds and reapply CO_FAST_HIDDEN to the corresponding entries in the
rebuilt localspluskinds (matching by index/name), or simply reuse the original
localspluskinds where possible so serialize_code(...) retains CO_FAST_HIDDEN;
update the block around split_localplus, build_localspluskinds_from_split, and
the localspluskinds assignment to perform this merge.
- Around line 481-518: deserialize_value_depth currently reserves a None
placeholder for flagged containers, but only sets refs[idx] after
deserialize_value_typed returns, so recursive backrefs to a container being
built see None and fail; fix by reserving and inserting a live placeholder value
into refs before calling deserialize_value_typed so TYPE_REF can resolve to the
in-progress container. Concretely, modify deserialize_value_depth to create an
empty/container placeholder via the Bag API (or add a
Bag::make_placeholder/mutable container constructor) and store Some(placeholder)
at refs[idx] prior to calling deserialize_value_typed, then have
deserialize_value_typed fill/populate that placeholder in-place (or accept the
slot index to mutate refs[idx]) instead of replacing it after the fact; ensure
Type::try_from and the slot handling logic in deserialize_value_typed support
in-place population for lists/dicts/sets to allow recursive references to
resolve.
- Around line 279-345: The helpers read_marshal_bytes, read_marshal_str,
read_marshal_str_vec, read_marshal_name_tuple and read_marshal_const_tuple are
stripping FLAG_REF and not reserving/referring into the global marshal ref
table, and read_marshal_const_tuple calls deserialize_value in a way that builds
a fresh refs vector for each element; this breaks v3+ streams where TYPE_REF
indices are global. Fix by making these readers handle FLAG_REF correctly: when
reading the element type byte check for FLAG_REF and if present reserve a slot
in the shared refs table before deserializing and use the reserved index for ref
entries instead of stripping the flag; propagate a mutable refs container into
read_marshal_* functions (and into deserialize_value) so tuples use the same
refs vector for all elements; remove any local per-element refs creation in
read_marshal_const_tuple and instead pass the shared refs through each call so
TYPE_REFs decode consistently across the whole code object. Ensure functions
read_marshal_str_vec, read_marshal_name_tuple and read_marshal_const_tuple
accept &mut refs (or a context that owns it) and that FLAG_REF handling is
centralized where type_byte is consumed.
In `@crates/compiler-core/src/varint.rs`:
- Around line 62-80: read_varint_with_start (and the other varint reader)
currently returns Some(val) even when the input ends while the continuation bit
is still set; update the loop exit logic to detect an unterminated varint and
return None in that case. Concretely: after the while loop in
read_varint_with_start (and analogously in the other reader), if cont is still
true (meaning we exited because *pos >= data.len()), return None instead of
Some(val); otherwise return Some(val). This ensures malformed/unterminated
varints are treated as failures.
- Around line 35-50: write_varint_be currently only emits up to five 6-bit
chunks and thus drops bits for values >= 1 << 30; change it to emit all 6-bit
groups until the remaining value fits in 6 bits (i.e., use a loop instead of
fixed conditionals) so the top bits are preserved for full u32 range.
Specifically, in write_varint_be replace the fixed if-chain with a loop that
repeatedly computes group = (val >> shift) & 0x3f or repeatedly shifts the
remaining value, pushes 0x40 | group for all non-final groups and pushes the
final (val & 0x3f) without the continuation bit, and then return the number of
bytes written; keep the function signature and return behavior the same.
In `@crates/vm/src/stdlib/marshal.rs`:
- Around line 519-549: The load function currently calls read() with no size
which slurps the rest of the stream; change load to read only as much as the
deserializer needs by feeding a streaming buffer to marshal::deserialize_value
instead of pulling the entire file: replace the
ArgBytesLike::try_from_object(read_res) / buf approach with a small-read loop
(using the file object's read(size) or readinto-like API) that appends bytes to
a Vec<u8> and repeatedly calls deserialize_value on a &[u8] slice when more data
is required, only calling tell/seek after you know how many bytes were consumed;
keep the existing error mapping and check_no_code(&result, vm) logic and
reference the same symbols (load, marshal::deserialize_value, ArgBytesLike,
tell/seek, check_no_code) when making the change.
- Around line 420-473: The code is calling unwrap() on container insertions
which can panic on malformed input; update make_set (remove
set.add(...).unwrap()), make_frozenset (remove
PyFrozenSet::from_iter(...).unwrap()), and make_dict (remove
dict.set_item(...).unwrap()) to handle insertion errors cleanly instead of
panicking: detect failures from set.add, PyFrozenSet::from_iter, and
dict.set_item and return an appropriate Err(marshal::MarshalError) (or convert
the underlying Python error into a MarshalError) so malformed/unhashable
elements produce a controlled error from make_set/make_frozenset/make_dict
rather than causing a Rust panic.
- Around line 552-580: check_no_code currently skips PySlice, allowing code
objects in slice.start/stop/step to bypass allow_code=false; update
check_no_code to detect PySlice (crate::builtins::PySlice) and recursively call
check_no_code on its start, stop, and step components (similar to how
PyTuple/PyList/PyDict are handled) so that slice(start, stop, step) cannot
smuggle code objects through dumps/loads; ensure you use the slice's accessor
methods consistent with other container handling and keep error behavior
identical when a PyCode is found.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 523278ff-f428-42e4-813f-fb5cc5ab398b
⛔ Files ignored due to path filters (1)
Lib/test/test_marshal.pyis excluded by!Lib/**
📒 Files selected for processing (7)
crates/compiler-core/src/bytecode.rscrates/compiler-core/src/bytecode/oparg.rscrates/compiler-core/src/marshal.rscrates/compiler-core/src/varint.rscrates/vm/src/builtins/code.rscrates/vm/src/stdlib/marshal.rscrates/vm/src/types/slot.rs
Sorry, something went wrong.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
crates/compiler-core/src/marshal.rs (3)
921-934: ⚠️ Potential issue | 🟠 MajorRebuilding
co_localspluskindshere still drops hidden flags.This reconstruction only emits
LOCAL/CELL/FREE, so metadata already preserved incode.localspluskinds—for exampleCO_FAST_HIDDEN—is discarded on the next marshal write. Since Line 252 now keeps the original bytes, serialize those bytes back out here instead of regenerating the mask.💡 Minimal fix
- let total_lp = code.varnames.len() + cell_only_names.len() + code.freevars.len(); - let mut kinds = Vec::with_capacity(total_lp); - for v in code.varnames.iter() { - let mut k: u8 = CO_FAST_LOCAL; - if local_cell_set.contains(v.as_ref()) { - k |= CO_FAST_CELL; - } - kinds.push(k); - } - kinds.extend(core::iter::repeat_n(CO_FAST_CELL, cell_only_names.len())); - kinds.extend(core::iter::repeat_n(CO_FAST_FREE, code.freevars.len())); buf.write_u8(Type::Bytes as u8); - write_vec(buf, &kinds); + debug_assert_eq!(total_lp_count, code.localspluskinds.len()); + write_vec(buf, &code.localspluskinds);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/marshal.rs` around lines 921 - 934, The current reconstruction of locals+kind masks (loop over code.varnames, extend repeats for cell_only_names and code.freevars) discards hidden/other flags present in the original code.localspluskinds; instead of regenerating the mask, emit the preserved byte sequence saved earlier (code.localspluskinds) back into the buffer — replace the creation/population of kinds and the write_vec(&kinds) with writing the original code.localspluskinds bytes via the same Type::Bytes/write_vec path so hidden flags (e.g. CO_FAST_HIDDEN) are retained.
502-516: ⚠️ Potential issue | 🔴 CriticalReserved slots still cannot satisfy backrefs to in-progress containers.
refs.push(None)reserves the index, but the slot is only populated afterdeserialize_value_typed()returns. If a list/dict/set/frozenset contains aTYPE_REFback to itself while it is being built, the lookup still seesNoneand fails withInvalidBytecode, so recursive containers will not round-trip. This needs a live placeholder inserted before descending and then filled in place.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/marshal.rs` around lines 502 - 516, The reserved-slot approach pushes None then fills it after deserialize_value_typed, which still prevents TYPE_REF lookups during recursive descent; change this to insert a live placeholder into refs (when flag is true) before calling deserialize_value_typed so backrefs can resolve to the in-progress container, then after deserialization replace that placeholder in-place with the final value (avoid using a cloned value for the slot so backrefs point to the same instance). Update the logic around refs, slot, and the call to deserialize_value_typed to create/fill a placeholder per container and ensure the final assignment mutates refs[idx] to the completed value rather than leaving a None or a clone.
278-344: ⚠️ Potential issue | 🔴 CriticalStill decoding code-object fields outside the shared marshal ref table.
These helpers all clear
FLAG_REFlocally, so flagged field values never reserve a global ref slot, andread_marshal_const_tuple()recreatesrefsfor each element viadeserialize_value(rdr, bag). In a valid v3+ stream,TYPE_REFs insideco_consts,co_names, or later code-object fields will then read the wrong slot and desynchronize every following reference. Thread the caller’s sharedrefsstate throughdeserialize_code()and these helpers instead of consuming the type byte here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/marshal.rs` around lines 278 - 344, The helpers read_marshal_bytes, read_marshal_str, read_marshal_str_vec, read_marshal_name_tuple and read_marshal_const_tuple are clearing FLAG_REF and consuming the type byte locally, so flagged objects never reserve entries in the shared refs table and subsequent TYPE_REFs get desynchronized; thread the caller's shared refs state through deserialize_code and these helper functions (add a &mut refs parameter or equivalent to their signatures), stop stripping FLAG_REF or consuming the type byte inside these helpers (let deserialize_code/read-dispatch handle the initial type/FLAG_REF logic and reserve ref slots), and make deserialize_value use the shared &mut refs (and not recreate refs or take Bag by value) so all co_consts/co_names/code-object fields use the same global refs table.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/compiler-core/src/marshal.rs`:
- Around line 606-637: In the Type::Dict branch the code peels raw and calls
deserialize_value_typed(rdr, bag, d, refs, key_type) which bypasses the depth
guard and can underflow when depth == 1; change the key-path to use the same
guarded entry used for values by calling deserialize_value_depth(rdr, bag, d,
refs) (or, if you must keep typed deserialization, explicitly check if d == 0
and reject composite key types like tuples/frozensets) so that depth is not
decremented twice; update the logic around reading FLAG_REF and refs (the
key_slot handling and refs[idx] assignment) to work with the
deserialize_value_depth flow and preserve the existing ref semantics.
---
Duplicate comments:
In `@crates/compiler-core/src/marshal.rs`:
- Around line 921-934: The current reconstruction of locals+kind masks (loop
over code.varnames, extend repeats for cell_only_names and code.freevars)
discards hidden/other flags present in the original code.localspluskinds;
instead of regenerating the mask, emit the preserved byte sequence saved earlier
(code.localspluskinds) back into the buffer — replace the creation/population of
kinds and the write_vec(&kinds) with writing the original code.localspluskinds
bytes via the same Type::Bytes/write_vec path so hidden flags (e.g.
CO_FAST_HIDDEN) are retained.
- Around line 502-516: The reserved-slot approach pushes None then fills it
after deserialize_value_typed, which still prevents TYPE_REF lookups during
recursive descent; change this to insert a live placeholder into refs (when flag
is true) before calling deserialize_value_typed so backrefs can resolve to the
in-progress container, then after deserialization replace that placeholder
in-place with the final value (avoid using a cloned value for the slot so
backrefs point to the same instance). Update the logic around refs, slot, and
the call to deserialize_value_typed to create/fill a placeholder per container
and ensure the final assignment mutates refs[idx] to the completed value rather
than leaving a None or a clone.
- Around line 278-344: The helpers read_marshal_bytes, read_marshal_str,
read_marshal_str_vec, read_marshal_name_tuple and read_marshal_const_tuple are
clearing FLAG_REF and consuming the type byte locally, so flagged objects never
reserve entries in the shared refs table and subsequent TYPE_REFs get
desynchronized; thread the caller's shared refs state through deserialize_code
and these helper functions (add a &mut refs parameter or equivalent to their
signatures), stop stripping FLAG_REF or consuming the type byte inside these
helpers (let deserialize_code/read-dispatch handle the initial type/FLAG_REF
logic and reserve ref slots), and make deserialize_value use the shared &mut
refs (and not recreate refs or take Bag by value) so all
co_consts/co_names/code-object fields use the same global refs table.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 16fbaa0f-e75b-46d4-8cec-fce352c0476f
📒 Files selected for processing (2)
crates/compiler-core/src/marshal.rscrates/compiler-core/src/varint.rs
✅ Files skipped from review due to trivial changes (1)
- crates/compiler-core/src/varint.rs
Sorry, something went wrong.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
crates/compiler-core/src/varint.rs (1)
121-129: Consider adding edge-case test values.The roundtrip test covers common values but doesn't test the upper boundary. Adding
u32::MAXand values near1 << 30would validate the full 32-bit range handling.💡 Suggested addition
- for &val in &[0u32, 1, 63, 64, 127, 128, 4095, 4096, 1_000_000] { + for &val in &[0u32, 1, 63, 64, 127, 128, 4095, 4096, 1_000_000, 1 << 30, u32::MAX] {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/varint.rs` around lines 121 - 129, Update the test_be_varint_roundtrip test to include edge-case u32 values such as u32::MAX and values around 1 << 30 (for example (1 << 30) - 1 and 1 << 30) so the big-endian varint encoder/decoder (write_varint_be and read_varint) is validated across the full 32-bit boundary; add those values to the list iterated in the test and keep the same round-trip assertions (assert_eq!(read_varint(&buf, &mut pos), Some(val)) and assert_eq!(pos, buf.len())).crates/compiler-core/src/marshal.rs (1)
609-644: Dict deserialization logic is correct but duplicates ref handling.The key-path (lines 619-638) manually handles
TYPE_REFandFLAG_REFinstead of usingdeserialize_value_depth. While functionally correct, this duplicates the ref-handling logic. The depth guard is properly maintained since both keys and values used = depth - 1.💡 Consider using deserialize_value_depth for keys
This would reduce code duplication:
loop { // Peek type byte for null check let raw = rdr.read_u8()?; if raw & !FLAG_REF == b'0' { break; } // Rewind or use a different approach to let deserialize_value_depth handle the byte // Alternatively, accept the current implementation as-is since it works }Since rewinding isn't straightforward with the current
Readtrait, the current approach is acceptable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/marshal.rs` around lines 609 - 644, The Dict deserialization duplicates reference-handling logic for keys; refactor the key-path to let deserialize_value_depth handle both normal and TYPE_REF cases by peeking the next type byte instead of consuming it: replace the read_u8() call with a non-consuming peek using BufRead::fill_buf (or wrap rdr in a BufReader) to check for the terminator (raw & !FLAG_REF == b'0'), and then call deserialize_value_depth(rdr, bag, d, refs, key_type) (or the existing deserialize_value_depth overload) to read the key so ref bookkeeping is centralized; update the function signature or wrap the reader as needed so rdr supports BufRead and remove the manual TYPE_REF / FLAG_REF branch in the key handling block.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/compiler-core/src/marshal.rs`:
- Around line 609-644: The Dict deserialization duplicates reference-handling
logic for keys; refactor the key-path to let deserialize_value_depth handle both
normal and TYPE_REF cases by peeking the next type byte instead of consuming it:
replace the read_u8() call with a non-consuming peek using BufRead::fill_buf (or
wrap rdr in a BufReader) to check for the terminator (raw & !FLAG_REF == b'0'),
and then call deserialize_value_depth(rdr, bag, d, refs, key_type) (or the
existing deserialize_value_depth overload) to read the key so ref bookkeeping is
centralized; update the function signature or wrap the reader as needed so rdr
supports BufRead and remove the manual TYPE_REF / FLAG_REF branch in the key
handling block.
In `@crates/compiler-core/src/varint.rs`:
- Around line 121-129: Update the test_be_varint_roundtrip test to include
edge-case u32 values such as u32::MAX and values around 1 << 30 (for example (1
<< 30) - 1 and 1 << 30) so the big-endian varint encoder/decoder
(write_varint_be and read_varint) is validated across the full 32-bit boundary;
add those values to the list iterated in the test and keep the same round-trip
assertions (assert_eq!(read_varint(&buf, &mut pos), Some(val)) and
assert_eq!(pos, buf.len())).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 0cc38ce0-d090-4c65-9b73-110903f04d03
⛔ Files ignored due to path filters (1)
Lib/test/test_marshal.pyis excluded by!Lib/**
📒 Files selected for processing (7)
crates/compiler-core/src/bytecode.rscrates/compiler-core/src/bytecode/oparg.rscrates/compiler-core/src/marshal.rscrates/compiler-core/src/varint.rscrates/vm/src/builtins/code.rscrates/vm/src/stdlib/marshal.rscrates/vm/src/types/slot.rs
✅ Files skipped from review due to trivial changes (1)
- crates/vm/src/stdlib/marshal.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/compiler-core/src/bytecode.rs
- crates/vm/src/builtins/code.rs
Sorry, something went wrong.
Unify marshal to a single CPython-compatible format. No separate "cpython_marshal" reader — one format for frozen modules, .pyc files, and the Python-level marshal module. - ComparisonOperator: `(cmp_index << 5) | mask` matching COMPARE_OP - MakeFunctionFlag: bit-position matching SET_FUNCTION_ATTRIBUTE - Exception table varint: big-endian (matching Python/assemble.c) - Linetable varint: little-endian (unchanged) - Integer: TYPE_INT (i32) / TYPE_LONG (base-2^15 digits) - Code objects: CPython field order (argcount, posonlyargcount, ..., co_localsplusnames, co_localspluskinds, ..., co_exceptiontable) - FLAG_REF / TYPE_REF for object deduplication (version >= 3) - allow_code keyword argument on dumps/loads/dump/load - Subclass rejection (int/float/complex/tuple/list/dict/set/frozenset) - Slice serialization (version >= 5) - Buffer protocol fallback for memoryview/array - Recursion depth limit (2000) for both reads and writes - Streaming load (reads one object, seeks file position) - TYPE_INT64, TYPE_FLOAT (text), TYPE_COMPLEX (text) for compat serialize_code writes co_localsplusnames/co_localspluskinds from split varnames/cellvars/freevars. deserialize_code splits them back. Cell variable DEREF indices are translated between flat (wire) and cell-relative (internal) representations in both directions. Replace bitwise trick with match for new ComparisonOperator values. 21 -> 3 expected failures. Remaining: test_bad_reader (IO layer), test_deterministic_sets (PYTHONHASHSEED), testIntern (string interning).
- Use original localspluskinds from marshal data instead of rebuilding, preserving CO_FAST_HIDDEN and other flags - Fix write_varint_be to handle values >= 2^30 (add 6th chunk) - Remove unused build_localspluskinds_from_split
Prevents usize underflow when dict key deserialization path calls deserialize_value_typed with depth=0 on composite types.
There was a problem hiding this comment.
♻️ Duplicate comments (4)
crates/compiler-core/src/marshal.rs (2)
205-233: ⚠️ Potential issue | 🔴 CriticalCode-object field readers still bypass the shared marshal ref table.
deserialize_code()consumesco_code/co_consts/co_names/co_localsplus*/ name fields through helpers that stripFLAG_REF, andread_marshal_const_tuple()restarts ref tracking by callingdeserialize_value()per element. In v3+ streams the ref table is global across the whole code object, so validTYPE_REFs here will either fail outright or shift every later index.Also applies to: 278-344
480-517: ⚠️ Potential issue | 🔴 CriticalRecursive
FLAG_REFcontainers still cannot resolve back-references.
deserialize_value_depth()reservesNonefor a flagged container and only fills the slot after the full value is built. A back-reference encountered while reading that same container hitsNoneand returnsInvalidBytecode, so recursive list/dict/set round-trips are still broken.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/marshal.rs` around lines 480 - 517, deserialize_value_depth currently pushes None for FLAG_REF slots so back-references to the same container fail; instead reserve and insert a provisional placeholder Some(placeholder) into refs before calling deserialize_value_typed so any back-ref during child deserialization can see the in-progress container, then replace or finalize refs[idx] with the completed value after deserialize_value_typed returns. Modify the FLAG_REF branch in deserialize_value_depth (where refs.push(None) is done and slot is set) to create and push a placeholder Bag::Value (or use a Bag-provided placeholder factory), pass refs into deserialize_value_typed as before, and on completion assign refs[idx] = Some(value) (or mutate the placeholder into the final value) to preserve recursive list/dict/set round-trips.crates/compiler-core/src/bytecode/oparg.rs (1)
385-388: ⚠️ Potential issue | 🟠 Major
AnnotateandTypeParamsdo not agree between docs and conversions.The doc block says
TypeParams=0x10/Annotate=0x20, while the enum positions plusFrom<MakeFunctionFlag> for u32emitAnnotate -> 0x10andTypeParams -> 0x20. If the comment reflects the intended CPython layout, these two flags still round-trip incorrectly; otherwise the docs need to be flipped to match the implementation.In current CPython opcode metadata, what bit values does SET_FUNCTION_ATTRIBUTE use for defaults, kwdefaults, annotations, closure, __annotate__, and type parameters?Also applies to: 407-428
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode/oparg.rs` around lines 385 - 388, The doc comment for `SET_FUNCTION_ATTRIBUTE` bitmask disagrees with the enum/ conversion: update either the documentation or the conversions so `Annotate` and `TypeParams` match CPython order; specifically inspect the `bitflag!` declaration and the `MakeFunctionFlag` enum plus the `From<MakeFunctionFlag> for u32` implementation and make the mapping consistent (if CPython uses TypeParams=0x10 and Annotate=0x20, change the enum/From impl to emit those values; otherwise swap the doc values to match the existing `From` mapping), and apply the same correction to the other occurrences in the block around the `SET_FUNCTION_ATTRIBUTE` docs and related conversions (lines referencing `Annotate`, `TypeParams`, `bitflag!`, and `From<MakeFunctionFlag> for u32`).crates/compiler-core/src/bytecode.rs (1)
570-576: ⚠️ Potential issue | 🟠 Major
disable_specialization()still does not survive first execution.This only seeds
UNREACHABLE_BACKOFF, butCodeUnits::quicken()later rewrites cacheable sites with warmup counters unconditionally. Code marked non-specializable here will still be re-enabled atRESUME; the opt-out needs to persist across quickening.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode.rs` around lines 570 - 576, disable_specialization currently only seeds adaptive_counters with UNREACHABLE_BACKOFF but CodeUnits::quicken() later overwrites cacheable sites with warmup counters, so the opt-out doesn't persist; update the design so disable_specialization() also marks sites as permanently non-specializable (e.g. set a per-site flag or change the cache-kind metadata for those sites) and then modify CodeUnits::quicken() (and any RESUME handling) to check that marker and skip replacing those sites with warmup counters, rather than unconditionally writing warmup values to adaptive_counters.
🧹 Nitpick comments (1)
crates/compiler-core/src/bytecode.rs (1)
74-76: Add a multi-byte exception-table round-trip case.The current tests only cover single-byte varints, so this
write_varint_be()change is never exercised across a byte boundary. A case withstart,size, ortargetabove0x3fwould catch a reader/writer endian mismatch immediately.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode.rs` around lines 74 - 76, Add a test exercising multi-byte varints for exception-table round-trip to catch big-endian vs little-endian mismatches: create an exception-table entry with at least one of start, size, or target > 0x3f (e.g. 0x40 or larger), serialize it using write_varint_be (the same logic that writes start/size/target/depth_lasti) then deserialize with the reader path and assert equality; target areas to update are the exception-table round-trip test that currently only uses single-byte values (look for tests referencing write_varint_be, exception-table, start, size, target) so that the writer/reader endian handling is exercised across multi-byte varint boundaries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 570-576: disable_specialization currently only seeds
adaptive_counters with UNREACHABLE_BACKOFF but CodeUnits::quicken() later
overwrites cacheable sites with warmup counters, so the opt-out doesn't persist;
update the design so disable_specialization() also marks sites as permanently
non-specializable (e.g. set a per-site flag or change the cache-kind metadata
for those sites) and then modify CodeUnits::quicken() (and any RESUME handling)
to check that marker and skip replacing those sites with warmup counters, rather
than unconditionally writing warmup values to adaptive_counters.
In `@crates/compiler-core/src/bytecode/oparg.rs`:
- Around line 385-388: The doc comment for `SET_FUNCTION_ATTRIBUTE` bitmask
disagrees with the enum/ conversion: update either the documentation or the
conversions so `Annotate` and `TypeParams` match CPython order; specifically
inspect the `bitflag!` declaration and the `MakeFunctionFlag` enum plus the
`From<MakeFunctionFlag> for u32` implementation and make the mapping consistent
(if CPython uses TypeParams=0x10 and Annotate=0x20, change the enum/From impl to
emit those values; otherwise swap the doc values to match the existing `From`
mapping), and apply the same correction to the other occurrences in the block
around the `SET_FUNCTION_ATTRIBUTE` docs and related conversions (lines
referencing `Annotate`, `TypeParams`, `bitflag!`, and `From<MakeFunctionFlag>
for u32`).
In `@crates/compiler-core/src/marshal.rs`:
- Around line 480-517: deserialize_value_depth currently pushes None for
FLAG_REF slots so back-references to the same container fail; instead reserve
and insert a provisional placeholder Some(placeholder) into refs before calling
deserialize_value_typed so any back-ref during child deserialization can see the
in-progress container, then replace or finalize refs[idx] with the completed
value after deserialize_value_typed returns. Modify the FLAG_REF branch in
deserialize_value_depth (where refs.push(None) is done and slot is set) to
create and push a placeholder Bag::Value (or use a Bag-provided placeholder
factory), pass refs into deserialize_value_typed as before, and on completion
assign refs[idx] = Some(value) (or mutate the placeholder into the final value)
to preserve recursive list/dict/set round-trips.
---
Nitpick comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 74-76: Add a test exercising multi-byte varints for
exception-table round-trip to catch big-endian vs little-endian mismatches:
create an exception-table entry with at least one of start, size, or target >
0x3f (e.g. 0x40 or larger), serialize it using write_varint_be (the same logic
that writes start/size/target/depth_lasti) then deserialize with the reader path
and assert equality; target areas to update are the exception-table round-trip
test that currently only uses single-byte values (look for tests referencing
write_varint_be, exception-table, start, size, target) so that the writer/reader
endian handling is exercised across multi-byte varint boundaries.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 663a9a7c-fe8c-46f7-8831-b0a4973ada80
⛔ Files ignored due to path filters (1)
Lib/test/test_marshal.pyis excluded by!Lib/**
📒 Files selected for processing (7)
crates/compiler-core/src/bytecode.rscrates/compiler-core/src/bytecode/oparg.rscrates/compiler-core/src/marshal.rscrates/compiler-core/src/varint.rscrates/vm/src/builtins/code.rscrates/vm/src/stdlib/marshal.rscrates/vm/src/types/slot.rs
✅ Files skipped from review due to trivial changes (2)
- crates/vm/src/builtins/code.rs
- crates/vm/src/types/slot.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/compiler-core/src/varint.rs
- crates/vm/src/stdlib/marshal.rs
Sorry, something went wrong.
8c01615
into
RustPython:main
Mar 23, 2026
* CPython-compatible marshal format Unify marshal to a single CPython-compatible format. No separate "cpython_marshal" reader — one format for frozen modules, .pyc files, and the Python-level marshal module. - ComparisonOperator: `(cmp_index << 5) | mask` matching COMPARE_OP - MakeFunctionFlag: bit-position matching SET_FUNCTION_ATTRIBUTE - Exception table varint: big-endian (matching Python/assemble.c) - Linetable varint: little-endian (unchanged) - Integer: TYPE_INT (i32) / TYPE_LONG (base-2^15 digits) - Code objects: CPython field order (argcount, posonlyargcount, ..., co_localsplusnames, co_localspluskinds, ..., co_exceptiontable) - FLAG_REF / TYPE_REF for object deduplication (version >= 3) - allow_code keyword argument on dumps/loads/dump/load - Subclass rejection (int/float/complex/tuple/list/dict/set/frozenset) - Slice serialization (version >= 5) - Buffer protocol fallback for memoryview/array - Recursion depth limit (2000) for both reads and writes - Streaming load (reads one object, seeks file position) - TYPE_INT64, TYPE_FLOAT (text), TYPE_COMPLEX (text) for compat serialize_code writes co_localsplusnames/co_localspluskinds from split varnames/cellvars/freevars. deserialize_code splits them back. Cell variable DEREF indices are translated between flat (wire) and cell-relative (internal) representations in both directions. Replace bitwise trick with match for new ComparisonOperator values. 21 -> 3 expected failures. Remaining: test_bad_reader (IO layer), test_deterministic_sets (PYTHONHASHSEED), testIntern (string interning). * Address code review: preserve CO_FAST_HIDDEN, fix varint overflow - Use original localspluskinds from marshal data instead of rebuilding, preserving CO_FAST_HIDDEN and other flags - Fix write_varint_be to handle values >= 2^30 (add 6th chunk) - Remove unused build_localspluskinds_from_split * Add depth guard to deserialize_value_typed Prevents usize underflow when dict key deserialization path calls deserialize_value_typed with depth=0 on composite types.
Summary by CodeRabbit
New Features
Improvements