Bytecode parity - direct loop backedges#7578

youknowone

Summary by CodeRabbit

Bug Fixes
- Fixed loop return stack cleanup behavior to correctly align with CPython's iterator-slot management.
Chores
- Enhanced compiler bytecode generation pipeline with improved jump threading and block reordering optimizations.
- Updated bytecode inspection tools to use raw bytecode representation for accurate analysis.

coderabbitai

📝 Walkthrough

Walkthrough

The pull request modifies the RustPython compiler's intermediate representation and bytecode generation. Changes include: refactoring loop return stack cleanup in compile.rs, extending the CFG transformation pipeline in ir.rs with new jump threading and block reordering utilities, and updating dis_dump.py to disable instruction normalization for raw bytecode parity.

Changes

Cohort / File(s)	Summary
Compiler loop and control flow `crates/codegen/src/compile.rs`	Updated `Compiler::unwind_fblock` to emit `Swap { i: 2 }` followed by `PopTop` for `ForLoop` returns when `preserve_tos` is set, instead of `PopIter`, to match CPython's iterator-slot cleanup. Added two bytecode-structure tests validating jump threading around `STORE_SUBSCR` and loop return control-flow ordering.
CFG transformation and jump threading pipeline `crates/codegen/src/ir.rs`	Added `set_to_nop` helper for consistent NOP conversion. Extended jump threading into `jump_threading_unconditional` variant and refactored original into `jump_threading_impl`. Introduced new CFG pipeline stage with `reorder_conditional_exit_and_jump_blocks` and `reorder_conditional_jump_and_exit_blocks` predicates. Added `duplicate_jump_targets_without_lineno` to improve line-number resolution around jump-only blocks.
Bytecode instruction normalization `scripts/dis_dump.py`	Disabled instruction normalization and superinstruction decomposition by clearing `SKIP_OPS` and `_OPNAME_NORMALIZE`, and removing `_SUPER_DECOMPOSE` logic. Updated jump-target mapping to operate on raw (un-normalized, un-decomposed) bytecode indices.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Bytecode parity - exception #7557: Modifies scripts/dis_dump.py instruction normalization and jump-target index mapping to use raw bytecode stream indices instead of decomposed/normalized indices.

Poem

🐰 The loop returns now reorder clean,
Jump threads race through bytecode stream,
Stack swaps dance where cleanups gleam,
Blocks reorder, CFG supreme! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Bytecode parity - direct loop backedges' directly relates to the main changes in the pull request, which focus on fixing loop backedge bytecode generation and adding jump threading for unconditional jumps to achieve bytecode parity with CPython.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/codegen/src/compile.rs`:
- Around line 11158-11163: Rename the misspelled identifier kwonlydefaults in
the test fixture to a cspell-friendly name (e.g., kw_only_defaults) by updating
the parameter name in the function f signature and every use inside the function
body (references to kwonlydefaults and checks like "if kwonlydefaults and kwarg
in kwonlydefaults" should use the new name); ensure all other test fixtures or
assertions that reference this parameter are updated consistently (function f,
parameters kwonlyargs and arg2value, and any access/assignment to the defaults
mapping).

In `@crates/codegen/src/ir.rs`:
- Around line 2737-2763: The code inspects the target block directly
(blocks[target.idx()]) which can be an empty placeholder and miss trampoline
chains; instead, before checking the target's first non-NOP instruction, resolve
the real target by threading through empty blocks using next_nonempty_block
(i.e. call next_nonempty_block(blocks, target) and update target to that
returned BlockIdx, skipping if it becomes BlockIdx::NULL) so the subsequent
logic that computes target_jump/target_ins operates on the first non-empty
block; retain existing checks for include_conditional, is_conditional_jump, and
other conditions.
- Around line 3551-3562: When cloning the jump-only block you adjust
predecessors for the trampoline itself but forget to increment the predecessor
count for the trampoline's downstream target (old_next), causing stale fan-in;
after you set new_block.next = old_next, push the new predecessor for the cloned
block (predecessors.push(1)), also increment predecessors[old_next.idx()] by one
(with a bounds/existence check if necessary) so the successor's incoming edge
count reflects the new clone — reference symbols: new_block, old_next, new_idx,
predecessors, target, blocks, propagate_locations_in_block.

In `@scripts/dis_dump.py`:
- Around line 182-193: The current loop that builds filtered and offset_to_idx
(iterating over raw and using normalized_idx with _SUPER_DECOMPOSE and
_OPNAME_NORMALIZE) produces a logical-instruction index map, not a
raw-bytecode-slot map; update the loop that builds filtered/offset_to_idx to
account for cache/prefix slots by inspecting each instruction's cache_info (or
otherwise using a low-level bytecode API) so offset_to_idx maps actual raw
bytecode slots used by compare_bytecode.py, e.g., when inst.cache_info indicates
extra cache entries increment normalized_idx and reserve corresponding slot
indices before mapping inst.offset; modify the code near the
filtered/offset_to_idx construction (the loop over raw, SKIP_OPS handling, and
uses of _SUPER_DECOMPOSE/_OPNAME_NORMALIZE) to synthesize those cache/prefix
slots or switch to a bytecode-level API.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 273c6c67-97cb-475e-9240-bd988f06cbe8

📥 Commits

Reviewing files that changed from the base of the PR and between a49ce5b and e372959.

📒 Files selected for processing (3)

crates/codegen/src/compile.rs
crates/codegen/src/ir.rs
scripts/dis_dump.py

youknowone force-pushed the bytecode-parity-jump branch from 35f5642 to 34a5afb Compare April 10, 2026 08:29

Bytecode parity - direct loop backedges

e372959

youknowone force-pushed the bytecode-parity-jump branch from 34a5afb to e372959 Compare April 10, 2026 09:03

youknowone marked this pull request as ready for review April 10, 2026 09:36

coderabbitai Bot reviewed Apr 10, 2026

View reviewed changes

youknowone deleted the bytecode-parity-jump branch April 10, 2026 09:49

This was referenced Apr 11, 2026

Bytecode parity - for-loop cleanup, return duplication, late jump threading fix #7580

Merged

Inline with-suppress return blocks and extend return duplication #7622

Merged

This was referenced Apr 30, 2026

Bytecode parity CFG normalization and jump cleanup #7721

Merged

Align bytecode CFG cleanup with CPython #7781

Merged

This was referenced May 16, 2026

Bytecode parity - CFG reorders and LOAD_FAST_BORROW chain #7870

Merged

Align nested code object bytecode parity #7942

Merged

Align codegen metadata with CPython #7952

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bytecode parity - direct loop backedges#7578