Align codegen passes and opcode metadata with CPython by youknowone · Pull Request #7987 · RustPython/RustPython
added 30 commits
read_marshal_bytes, _str, _str_vec, _name_tuple, and _const_tuple now take a shared ref table and resolve TYPE_REF / register FLAG_REF entries. deserialize_code is split into a public wrapper and an inner function that receives the ref table; deserialize_value_depth opens a fresh inner ref space when it hits Type::Code, mirroring CPython's behaviour of putting the code object itself at ref slot 0. Nested code objects inside const tuples reuse the surrounding code's ref space via the new read_const_value helper.
… 3.14 PYC_MAGIC_NUMBER changes from 2994 to 3627, matching CPython 3.14's pyc_magic_number_token (0x0a0d0e2b). marshal FORMAT_VERSION drops from 5 to 4 (the encoder/marshal.version value; the decoder already accepts both). check_pyc_magic_number_bytes now compares all four magic bytes instead of the first two.
SourceFileLoader.get_code now also looks for .pyc files using
_RP_FALLBACK_CACHE_TAGS (currently ('cpython-314',)) in addition to
sys.implementation.cache_tag. The matched .pyc is only used for
reading; recompilation still writes to the RustPython-tagged path, so
CPython's .pyc is never overwritten. Source-stat / hash / timestamp
validation logic is unchanged.
CPython's marshal supports TYPE_SLICE from format version 4 onwards and that is the default version. Rejecting slice dumps below version 5 made marshal.dumps(slice(...)) fail with the default version and broke test.test_marshal.SliceTestCase.test_slice.
Lib/importlib/_bootstrap_external.py is CPython's own code copied verbatim; local patches here defeat compatibility tracking. The cpython-XX cache_tag fallback needs to live on the RustPython side (Rust code or sys.implementation.cache_tag policy), not as edits to the imported standard library. This reverts commit 1fc426d0fb5fcdb50d35cad13bbb43e8f6ce1c7f.
Py_MARSHAL_VERSION is 5 in CPython 3.14.5 (Include/marshal.h:16) and TYPE_SLICE serialization rejects version < 5 (Python/marshal.c:720). Restore the same threshold and constant so marshal.version and the slice-marshal gate match CPython.
Code objects embedded in const-tuples reset the depth budget on each recursion, so a hostile or pathological marshal stream of code-in-tuple- in-code can blow the stack despite MAX_MARSHAL_STACK_DEPTH. Pass the current depth through deserialize_code_inner and read_marshal_const_tuple and decrement at each code-object/tuple boundary. Also route dict keys through deserialize_value_after_header so TYPE_CODE keys decode instead of failing with BadType.
Rename CFG helpers and accessors to the names used in CPython's compile.c (basicblock_next_instr, basicblock_last_instr, basicblock_append_instructions, bb_has_fallthrough, is_jump, make_cfg_traversal_stack, mark_warm/mark_cold, etc.). Drop the unused boolop-folding gate, mark_cpython_cfg_label_block helper, and ComprehensionLoopControl::iter_range field. Track an is_coroutine flag on SymbolTable, set in async def, await, and async comprehensions, and propagate it through non-generator comprehensions per symtable_handle_comprehension(). Mark SetupCleanup/SetupFinally/SetupWith as has_arg pseudo-ops, mark ForIter as a terminator, and add has_arg/has_const on AnyInstruction. Fix Instruction::stack_effect_jump to delegate to the opcode's stack_effect_jump rather than stack_effect.
Add CNOTAB, LNOTAB, ialloc, ioffset, iused, nblocks, ncellsused, ncellvars, nextop, noffsets, nvars, swaptimize, untargeted to .cspell.dict/cpython.txt for the new CFG/assembler code in crates/codegen/src/ir.rs.
- clippy: drop redundant `test_` prefix on three test functions and remove an unnecessary `u32` cast in basicblock_clear_reuses_cpython_spare_slots_in_offset_order - insta: regenerate nested_double_async_with snapshot to match the new CFG output that drops unreferenced labels after the redundant-NOP pass - regrtest: drop `@expectedFailure` markers from test_func_args, test_meth_args (test_compile), test_disassemble_with, test_disassemble_try_finally (test_dis), and test_except_star (test_monitoring) which now pass
Empty conf.toml since WithExceptStart and Setup{Cleanup,Finally,With}
stack effects already match CPython, so the TODO override entries are
stale and only cause CI hook diffs.
Regenerate opcode_metadata.rs and drop the matching SetupCleanup/
SetupFinally/SetupWith assertions on PseudoOpcode::has_arg(); their
`HAS_ARG` flag comes from pseudo definitions in bytecodes.c that the
upstream analyzer does not propagate through PseudoInstruction.properties,
so the generated has_arg() excludes them. has_target() still covers
these block-push pseudos via is_block_push().
The CPython invariant `assert(OPCODE_HAS_ARG(op) || !IS_BLOCK_PUSH(op))`
relies on SETUP_{FINALLY,CLEANUP,WITH} carrying `HAS_ARG_FLAG` in
CPython's metadata. The autogen tool reads pseudo-opcode properties from
target instructions and does not propagate the pseudo's own
HAS_ARG flag, so PseudoOpcode::has_arg() omits these three opcodes.
Drop the debug_assert that fired inside py_freeze proc-macro expansion.
…st via macro Add fn_has_eval_break to generate_rs_opcode_metadata.py using CPython's Properties.eval_breaker, removing the hand-written matches! body for Opcode::has_eval_break and PseudoOpcode::has_eval_break. Forward has_arg/has_const from Instruction and PseudoInstruction to their opcode, so AnyInstruction can use either_real_pseudo! like the other has_* accessors instead of an open-coded match.
youknowone
changed the title
compiler parity
Align codegen passes and opcode metadata with CPython
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters