◐ Shell
reader mode source ↗
Skip to content

Assign opcode ids#6637

Merged
youknowone merged 20 commits into
RustPython:mainfrom
ShaharNaveh:bytecode-opcode-id
Jan 9, 2026
Merged

Assign opcode ids#6637
youknowone merged 20 commits into
RustPython:mainfrom
ShaharNaveh:bytecode-opcode-id

Conversation

@ShaharNaveh

@ShaharNaveh ShaharNaveh commented Jan 3, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • Refactor

    • Internal bytecode instruction names and encodings were reorganized for clearer, strongly-discriminated layouts; this improves consistency in disassembly, error messages, and runtime handling without changing observable behavior.
  • Chores

    • Opcode metadata generation was automated from source to keep tooling and printed opcode names synchronized.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Jan 3, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Refactors the bytecode Instruction enum to use explicit numeric discriminants, renames and reshapes many opcodes (including argument-bearing variants), and propagates those renames across compiler, codegen, VM, JIT, stdlib metadata, tests, and the opcode-generation script.

Changes

Cohort / File(s) Summary
Bytecode enum
crates/compiler-core/src/bytecode.rs
Introduced explicit numeric discriminants and new/renamed variants (e.g., BinaryOp { op }, Call, CallKw, RaiseVarargs, LoadName, StoreName, BinarySubscr, DeleteSubscr, PopExcept, SetupAnnotations, Jump { target }, etc.). Adjusted TryFrom, display/serialization, label_arg, and stack-effect logic to match the new discriminants and shapes.
Code generation
crates/codegen/src/compile.rs
Replaced emission sites to use renamed instructions (Call/CallKw, LoadName, StoreName, StoreSubscr/DeleteSubscr, BinarySubscr, RaiseVarargs, CompareOp, PopExcept, SetupAnnotations, etc.). Many call/raise/name handling sites updated.
VM / Frame execution
crates/vm/src/frame.rs
Updated interpreter match arms to new variants (Call/CallKw, LoadName, StoreName, BinarySubscr, DeleteSubscr, RaiseVarargs, PopExcept, CompareOp, SetupAnnotations, etc.) and consolidated unreachable/default handling for removed placeholders.
JIT & runtime helpers
crates/jit/src/instructions.rs, crates/jit/tests/common.rs
Adjusted internal dispatch and tests to new variant names (Call, CompareOp, LoadName, StoreName), preserving argument shapes where applicable.
Stdlib opcode utilities
crates/stdlib/src/opcode.rs
Updated Opcode::has_name/has_jump checks to reflect renames (LoadName, StoreName, DeleteName) and removed some jump variants from the jump set.
Opcode metadata generator
scripts/generate_opcode_metadata.py
Rewrote generator to parse the Rust Instruction enum directly, derive opcode mapping dynamically, add Opcode NamedTuple with cpython_name, and emit Lib/_opcode_metadata.py instead of relying on hard-coded maps.
Tests / misc
crates/jit/tests/*, various call sites across crates
Propagated many variant renames in tests and call sites; ensure alignment with new discriminants and argument names.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • youknowone
  • arihant2math

Poem

🐇 Hop, hop — bytes align in rows,
Names and numbers find their pose,
Calls and raises, tidy, neat,
Opcodes march with steady feet,
A rabbit cheers: the bytecode grows! 🎉

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Assign opcode ids' directly and clearly describes the main objective of the changeset: assigning explicit discriminant values to instruction opcodes throughout the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 0

🧹 Nitpick comments (2)
crates/compiler-core/src/bytecode.rs (2)

592-871: Opcode discriminant assignments and TryFrom mapping look sound

The explicit repr(u8) values and the whitelist in Instruction::try_from are internally consistent: every value admitted by the match corresponds to a real variant (including the 222–255 range), so the mem::transmute::<u8, Instruction> call is sound and there are no “hole” values that would cause UB. From a maintainability perspective, if you expect to churn opcodes often, you might eventually consider an explicit match value { ... => Ok(Variant), _ => Err(..) } mapping instead of a broad range + transmute, but the current approach is fine correctness‑wise.

Also applies to: 875-915


656-662: Align CheckExcMatch documentation with its current behavior

The doc comment for Instruction::CheckExcMatch still describes it as pushing a boolean result (STACK[-2] vs STACK[-1]), but the variant now carries a Label arg, is included in label_arg, and its stack_effect is -2, matching the “duplicate exc + type, then consume both and keep the original exc” pattern used in compile_try_statement. It would be less confusing for future readers if the comment were updated to describe the jump‑on‑no‑match semantics and the fact that it no longer produces a standalone bool on the stack.

Also applies to: 1643-1656, 1761-1762

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0526b9 and f4b8be8.

📒 Files selected for processing (3)
  • crates/codegen/src/compile.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/frame.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/frame.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/codegen/src/compile.rs
🧬 Code graph analysis (2)
crates/vm/src/frame.rs (2)
crates/vm/src/vm/mod.rs (2)
  • value (675-675)
  • value (677-677)
crates/vm/src/exceptions.rs (2)
  • value (426-426)
  • value (590-590)
crates/compiler-core/src/bytecode.rs (1)
crates/compiler-core/src/marshal.rs (1)
  • try_from (79-111)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run tests under miri
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check the WASM package and demo
🔇 Additional comments (19)
crates/vm/src/frame.rs (10)

557-564: LGTM: Instruction rename is consistent.

The rename from BinarySubscript to BinarySubscr aligns with the PR's opcode standardization objective. The implementation logic remains unchanged.


791-803: LGTM: Semantic clarification with DeleteName.

The rename from DeleteLocal to DeleteName better reflects that this instruction deletes by name from the locals mapping (using ass_subscript), distinguishing it from DeleteFast which operates on the fastlocals array by index. This improves code clarity.


804-804: LGTM: Instruction rename is consistent.

The rename from DeleteSubscript to DeleteSubscr aligns with the opcode standardization.


1350-1358: LGTM: Instruction rename is consistent.

The rename from PopException to PopExcept follows the abbreviated naming convention used elsewhere in the bytecode.


1370-1370: LGTM: Instruction rename is consistent.

The rename from Raise to RaiseVarargs better communicates that this instruction handles variable arguments for raise statements (reraise, raise with cause, etc.).


1418-1418: LGTM: Instruction rename is consistent.

The rename from SetupAnnotation (singular) to SetupAnnotations (plural) is more accurate since the instruction sets up the __annotations__ dictionary which holds multiple annotations.


1487-1492: LGTM: Semantic clarification with StoreName.

The rename from StoreLocal to StoreName better reflects that this instruction stores by name in the locals mapping (using ass_subscript), distinguishing it from StoreFast which stores by index in the fastlocals array. This improves code clarity.


1493-1493: LGTM: Instruction rename is consistent.

The rename from StoreSubscript to StoreSubscr aligns with the opcode standardization and abbreviated naming convention.


530-1578: Ensure code passes cargo fmt and cargo clippy before merging.

Per the coding guidelines for Rust files, run cargo fmt to format the code and cargo clippy to check for lints. Fix any warnings introduced by these changes before final submission.


700-722: CheckExcMatch implementation is correct.

The implementation properly validates that exception types inherit from BaseException, handles both single types and tuples, and leverages is_instance which correctly delegates tuple matching to object_recursive_isinstance. The bytecode instruction is correctly emitted during exception handler compilation, and the logic accurately mirrors Python's exception matching behavior.

crates/compiler-core/src/bytecode.rs (2)

1696-1823: Stack effects for new/renamed opcodes match codegen usage

The updated stack_effect arms for BinarySubscr, Subscript, StoreSubscr, DeleteSubscr, StoreName/DeleteName, CheckEgMatch, CheckExcMatch, RaiseVarargs, EndAsyncFor, SetupAnnotations, etc., line up with how they’re emitted in compile.rs:

  • StoreSubscr => -3 and DeleteSubscr => -2 match the [value, container, index] and [container, index] stack shapes produced by compile_subscript, compile_store, and compile_augassign.
  • BinarySubscr => -1 is used where we duplicate the subject and then index into it in pattern helpers.
  • RaiseVarargs’s -(kind.get(arg) as u8 as i32) matches the number of values the compiler pushes for each RaiseKind (0/1/2).
  • CheckEgMatch => 0 (pop 2, push 2) is consistent with the [orig, list, rest, type] -> [orig, list, new_rest, match] comment in compile_try_star_except.
  • CheckExcMatch(_) => -2 matches the CopyItem + exc_type pattern in compile_try_statement (3 stack entries down to the original exc).

I don’t see any inconsistency between these definitions and the codegen sites.


875-915: Confirm bytecode compatibility expectations after changing opcode values

By pinning explicit repr(u8) discriminants and updating Instruction::try_from accordingly, the binary encoding of bytecode has changed. Any previously serialized CodeUnit/CodeObject (e.g., cached .pyc-like artifacts) using the old opcode layout will now either fail to decode (InvalidBytecode) or, if produced by older code that also used transmute, could be misinterpreted if reused across versions. If RustPython ever persists bytecode across versions, it would be good to confirm that:

  • Either the magic/marshal version is bumped so old caches are rejected explicitly, or
  • You guarantee such caches are never reused across a change like this.

Otherwise, you may get hard‑to‑diagnose failures when loading stale compiled artifacts.

crates/codegen/src/compile.rs (7)

445-502: Subscript codegen matches new BinarySubscr/StoreSubscr/DeleteSubscr semantics

The various subscript paths are internally consistent with the new opcodes and their stack effects:

  • compile_subscript:

    • Load: evaluates value, then slice, and calls Subscript, giving [obj, index] -> result as expected.
    • Store: when used from compile_store, we start with the assigned value on the stack; after compiling value and slice (and BuildSlice when optimizing 2‑element slices) we end up with [value, container, sub], which matches StoreSubscr => -3.
    • Delete: compiles value then slice and calls DeleteSubscr, with [container, sub] matching DeleteSubscr => -2.
  • pattern_helper_sequence_subscr duplicates the subject and uses BinarySubscr with [subject, subject, idx] so the helper keeps the original subject while extracting an element, consistent with BinarySubscr => -1.

  • compile_annotated_assign builds [annotation, __annotations__, name] before StoreSubscr, so __annotations__[name] = annotation is encoded correctly.

  • compile_augassign’s subscript case carefully arranges the stack to [result, container, index] before StoreSubscr, again matching the [value, container, sub] convention.

Given the stack_effect definitions in bytecode.rs, all of these sites look correct.

Also applies to: 1801-1827, 3621-3622, 4484-4511, 4588-4648


1180-1302: NameOp::Name migration to LoadNameAny/StoreName/DeleteName is coherent

Routing NameOp::Name through LoadNameAny / StoreName / DeleteName is consistent with how indices are allocated:

  • name() and get_global_name_index() both index into CodeInfo.metadata.names, which is exactly what these opcodes consume.
  • Module/class bodies (compile_program, compile_program_single, compile_class_body) and class metadata setup (__module__, __qualname__, __doc__, __firstlineno__, __type_params__, __classcell__) all use name() and StoreName, so they’re aligned with the bytecode enum.

As long as the VM side now treats LOAD_NAME_ANY/STORE_NAME/DELETE_NAME as the generic name path (mirroring the old local/name opcodes), behavior should be unchanged.

Also applies to: 2959-3042


1501-1518: RaiseVarargs and exception-matching opcodes integrate cleanly in try/except and assert paths

The switch from Raise to RaiseVarargs and from JumpIfNotExcMatch to CheckExcMatch is wired consistently:

  • Stmt::Raise:

    • RaiseKind::Raise is emitted after compiling one expression (stack_effect -1).
    • RaiseKind::RaiseCause is emitted after compiling two expressions (stack_effect -2).
    • Bare raise uses RaiseKind::Reraise (stack_effect 0), matching the absence of new stack values.
  • In assert lowering, the explicit RaiseVarargs { kind: Raise } matches compiling exactly one exception instance.

  • Classic try/except:

    • For typed handlers, the CopyItem + exc_type + CheckExcMatch(next_handler) pattern creates [exc, exc, type] so that CheckExcMatch can consume the duplicate and type and still leave the original exc for the bound name or PopTop.
    • Unhandled cases jump to the final next_handler block, where RaiseVarargs { kind: Reraise } is emitted, so unmatched exceptions are re‑raised.
  • try/except*:

    • Uses CheckEgMatch with the documented [orig, list, rest, type] -> [orig, list, new_rest, match] shape.
    • SetExcInfo is used before running the handler body so a bare raise picks up the right match.
    • The “no more to reraise” path does PopTop (drop the None result) and PopExcept, while the re‑raise path deliberately omits PopExcept and just executes RaiseVarargs { kind: Raise }, leaving block unwinding to the VM.

These all line up with the stack_effect definitions in bytecode.rs; I don’t see a mismatch.

Also applies to: 1586-1591, 2053-2147, 2178-2420


2923-2957: SetupAnnotations usage remains consistent with annotation detection

The rename to Instruction::SetupAnnotations has been plumbed through all relevant sites:

  • Module and REPL compilation (compile_program, compile_program_single) call find_ann and emit SetupAnnotations only when needed.
  • Class bodies (compile_class_body) also call find_ann and emit SetupAnnotations once at the top when annotations are present.

Since compile_annotated_assign is already gated on !ctx.in_func() for runtime annotation storage, this preserves the prior behavior with the new opcode name.

Also applies to: 1034-1053, 1063-1065, 3019-3022


3821-4022: CompareOp/ComparisonOperator wiring for patterns and comparisons looks correct

The migration to ComparisonOperator + Instruction::CompareOp is coherent:

  • Mapping in compile_addcompare:
    • CmpOp::Eq/NotEq/Lt/LtE/Gt/GtE map to Equal, NotEqual, Less, LessOrEqual, Greater, GreaterOrEqual respectively.
  • Structural pattern matching:
    • Mapping patterns use CompareOp { op: GreaterOrEqual } for the len(subject) >= size and len(subject) >= size-1 checks.
    • Sequence patterns use CompareOp { op: Equal } for len(subject) == size and GreaterOrEqual where a *rest is present.
    • Value patterns use CompareOp { op: Equal } for subject == value.
  • Other relational/inclusion ops (In/NotIn, Is/IsNot) delegate to ContainsOp and IsOp with the right Invert flag.

Given CompareOp has stack effect -1 and the surrounding code always leaves the subject underneath the comparison, these changes look correct.

Also applies to: 4132-4209, 4211-4223, 4361-4376


4484-4517: Annotation storage and augmented assignment now use StoreSubscr consistently

Two key sites now rely on StoreSubscr and are consistent with its stack contract:

  • compile_annotated_assign:

    • For simple name: T outside functions, after compile_annotation the stack is [annotation], then LoadNameAny(__annotations__), then the mangled name string, yielding [annotation, __annotations__, name] before StoreSubscr, which matches the [value, container, index] convention.
  • compile_augassign:

    • The subscript case builds [container, slice], duplicates both to get a copy for loading, uses Subscript to obtain the original value, then performs the inplace binary op with the RHS and finally reorders to [result, container, slice] just before StoreSubscr.

Both cases align with the StoreSubscr stack_effect and with how simple subscript stores are compiled via compile_subscript.

Also applies to: 4588-4648


5443-5630: Async for/comprehensions correctly use EndAsyncFor under the new opcode layout

In both explicit async for loops and comprehensions:

  • compile_for emits EndAsyncFor on the async path after the else block, matching EndAsyncFor => -2 (it cleans up the async iterator state).
  • compile_comprehension emits EndAsyncFor when any generator is async, and uses GetAIter/GetANext + Resume { AfterAwait }, consistent with the updated async for bytecodes.

Given the updated numeric assignment for EndAsyncFor and the unchanged stack_effect, the codegen here still looks correct.

Also applies to: 3331-3377

@youknowone

Copy link
Copy Markdown
Member

#6638 will change instruction a lot. Can this be aligned to that patch?

@ShaharNaveh ShaharNaveh marked this pull request as draft January 6, 2026 08:22

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/compiler-core/src/bytecode.rs (1)

1063-1074: Verify transmute safety with non-contiguous discriminants.

The TryFrom<u8> implementation uses unsafe { core::mem::transmute::<u8, Self>(value) } after checking value <= u8::from(LAST_INSTRUCTION). With explicit discriminants that have gaps (e.g., no opcodes at 0, 3-4, 11, 13, etc.), transmuting these gap values could produce invalid Instruction enum variants, which is undefined behavior.

Consider either:

  1. Filling all gaps with Reserved variants (e.g., Reserved0, Reserved3, Reserved4, etc.)
  2. Using an explicit match instead of transmute to validate each discriminant
🔎 Example of safer implementation using match
 impl TryFrom<u8> for Instruction {
     type Error = MarshalError;
 
     #[inline]
     fn try_from(value: u8) -> Result<Self, MarshalError> {
-        if value <= u8::from(LAST_INSTRUCTION) {
-            Ok(unsafe { core::mem::transmute::<u8, Self>(value) })
-        } else {
-            Err(MarshalError::InvalidBytecode)
-        }
+        // Safe alternative: explicit validation
+        match value {
+            1 => Ok(Self::BeforeAsyncWith),
+            2 => Ok(Self::BeforeWith),
+            5 => Ok(Self::BinarySubscr),
+            // ... other valid opcodes
+            149 => Ok(Self::Resume { arg: Arg::marker() }),
+            _ => Err(MarshalError::InvalidBytecode),
+        }
     }
 }

Note: This requires #[repr(u8)] on the enum and careful maintenance, but is memory-safe.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 153a03a and 5369dab.

📒 Files selected for processing (1)
  • crates/compiler-core/src/bytecode.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/compiler-core/src/bytecode.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (2)
crates/compiler-core/src/bytecode.rs (2)

679-712: Coordinate with PR #6638 before merging.

Based on the PR comments, PR #6638 will significantly change the instruction structure. Since this PR assigns explicit discriminants to the Instruction enum, merging both could create conflicts or require substantial rework.

Recommend coordinating with the author of PR #6638 to determine:

  1. Which PR should merge first
  2. Whether discriminant assignments should be deferred until after #6638
  3. How to handle potential conflicts in the instruction enum structure

679-712: Cache (opcode 0) and Reserved3 (opcode 3) variants appear to be referenced but not defined in the Instruction enum.

The review comment correctly identifies that opcode 0 is missing from the visible enum definition (BeforeAsyncWith starts at 1). However, the code references Cache and Reserved3 in match statements within stack_effect() (line 1846) and the display logic (line 2084), but these variants cannot be located in the enum definition. If not properly defined, this would cause compilation errors.

This discrepancy needs clarification:

  • If Cache and Reserved3 are intentionally omitted per CPython 3.13 spec (which removed CACHE as a separate opcode), they should be removed from match statements.
  • If they should be included, they must be added to the enum with appropriate discriminants (Cache = 0, Reserved3 = 3).

@ShaharNaveh ShaharNaveh marked this pull request as ready for review January 9, 2026 12:50
@ShaharNaveh ShaharNaveh requested a review from youknowone January 9, 2026 12:50

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @crates/codegen/src/compile.rs:
- Around line 1016-1025: The stack effect for PopExcept is incorrectly
documented as 0 but the VM's frame implementation (where PopExcept does let
prev_exc = self.pop_value()) consumes one value; update the stack_effect mapping
for the PopExcept opcode (the entry for PopExcept in your
stack_effect/stack_effects function or match table) from 0 to -1 so the
documented contract matches the VM behavior and downstream stack analysis.
- Around line 1881-1885: The generic-class keyword handling is wrong: instead of
emitting individual STRING constants for each kw name and increasing nargs with
keywords, build a single tuple constant of keyword name strings (use
ConstantData::Tuple) and emit that one tuple onto the stack, and emit the call
with nargs equal to the number of positional arguments only (do not add keywords
to nargs); mirror the pattern in compile_call_inner so the VM's
collect_keyword_args receives a PyTuple at the top of the stack and the
Call/CallKw ABI is satisfied.

In @crates/compiler-core/src/bytecode.rs:
- Around line 679-958: The VM currently ends bytecode dispatch with a catch-all
that calls unreachable!(), so placeholder enum variants in Bytecode (e.g.,
BinaryOpInplaceAddUnicode, BinarySlice, EndFor, LoadLocals, PushNull) can cause
a panic if accidentally executed; change the dispatch default arm to return a
descriptive runtime error (or Err) indicating an invalid/placeholder opcode
rather than panicking, and add an explicit match arm or helper check (e.g.,
is_placeholder_opcode) that logs or maps placeholder discriminants to that
error; additionally, add a validation step in bytecode loading or TryFrom<u8>
conversion for Bytecode to reject placeholder discriminants early (or emit
warnings) and document the retained placeholder variants in the Bytecode enum
comment to clarify they must never be emitted by codegen.

In @crates/vm/src/frame.rs:
- Around line 1782-1784: The match currently uses a catch-all arm with
unreachable!() for the Instruction enum which defeats compile-time
exhaustiveness checking; replace the wildcard arm by explicitly matching every
Instruction variant used in this match so the compiler will error if new
variants are added (i.e., remove the `_ => unreachable!()` arm in the match on
`instruction` and add explicit arms for each `Instruction::<Variant>` handled
here), or if the enum is intended to be extensible annotate the enum definition
with `#[non_exhaustive]` and then handle the `_` case with a documented
conversion/error path; ensure you update the match in the function/method in
frame.rs where `instruction` is matched so future additions to `Instruction`
cause a compile-time failure instead of a runtime panic.

In @scripts/generate_opcode_metadata.py:
- Around line 21-29: The cpython_name property contains multiple return
statements after the intended two-step transformation, making the last two
returns unreachable; edit the cpython_name method to keep the special-case for
"CopyItem" and the intended conversion (assign rust_name to name, apply the two
regex substitutions and return name.upper()), and remove the extraneous return
lines (the duplicate re.sub returns) so only the correct transformation and
single return remain in the cpython_name property.
🧹 Nitpick comments (1)
crates/compiler-core/src/bytecode.rs (1)

975-999: Document intentional gaps in opcode ranges.

The TryFrom<u8> implementation validates four separate ranges:

  • CPython opcodes: Cache (0) to YieldValue (118)
  • Resume: 149
  • RustPython-only: Break (119) to Subscript (135)
  • Pseudo opcodes: Jump (252) to PopBlock (255)

This leaves gaps (136-148, 150-251) that will return InvalidBytecode errors. Consider:

  1. Adding a comment in the TryFrom implementation documenting these intentional gaps.
  2. Verifying that deserialization of legacy bytecode won't encounter these gap values.
📝 Add documentation for opcode gaps
     #[inline]
     fn try_from(value: u8) -> Result<Self, MarshalError> {
+        // Valid opcode ranges:
+        // - 0..=118: CPython 3.13 compatible opcodes
+        // - 149: Resume instruction
+        // - 119..=135: RustPython-specific opcodes
+        // - 252..=255: Pseudo opcodes (not emitted in bytecode)
+        // Gaps (136-148, 150-251) are reserved for future use
         let cpython_start = u8::from(Self::Cache);
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5369dab and 258f61c.

⛔ Files ignored due to path filters (3)
  • Lib/_opcode_metadata.py is excluded by !Lib/**
  • Lib/test/test__opcode.py is excluded by !Lib/**
  • Lib/test/test_dis.py is excluded by !Lib/**
📒 Files selected for processing (7)
  • crates/codegen/src/compile.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/jit/src/instructions.rs
  • crates/jit/tests/common.rs
  • crates/stdlib/src/opcode.rs
  • crates/vm/src/frame.rs
  • scripts/generate_opcode_metadata.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/opcode.rs
  • crates/jit/src/instructions.rs
  • crates/jit/tests/common.rs
  • crates/vm/src/frame.rs
  • crates/codegen/src/compile.rs
  • crates/compiler-core/src/bytecode.rs
**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

  • scripts/generate_opcode_metadata.py
🧠 Learnings (1)
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: When modifying bytecode instructions, perform a full clean build by running `rm -r target/debug/build/rustpython-* && find . | grep -E '\.pyc$' | xargs rm -r`

Applied to files:

  • scripts/generate_opcode_metadata.py
🧬 Code graph analysis (2)
crates/codegen/src/compile.rs (1)
crates/compiler-core/src/bytecode.rs (8)
  • from (394-396)
  • from (437-439)
  • from (964-967)
  • from (1044-1046)
  • from (1050-1052)
  • try_from (974-999)
  • try_from (1020-1025)
  • try_from (1034-1040)
crates/compiler-core/src/bytecode.rs (3)
crates/vm/src/exceptions.rs (4)
  • value (438-438)
  • value (602-602)
  • w (1775-1775)
  • w (1776-1776)
crates/vm/src/frame.rs (3)
  • arg (2646-2647)
  • name (1322-1322)
  • name (2638-2638)
crates/codegen/src/compile.rs (1)
  • name (1210-1212)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (windows-2025)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
🔇 Additional comments (16)
crates/compiler-core/src/bytecode.rs (2)

1763-1914: LGTM! Stack effect calculations properly updated.

The stack effect calculations have been correctly updated to use the renamed instruction variants (LoadName, StoreName, DeleteName, Call, CallKw, RaiseVarargs, etc.). The logic for each instruction's stack manipulation appears correct.


1982-2104: LGTM! Disassembly formatting correctly updated.

The fmt_dis method has been properly updated to display the renamed instruction variants with their appropriate names (e.g., BINARY_SUBSCR, DELETE_SUBSCR, LOAD_NAME, STORE_NAME, CALL, CALL_KW, RAISE_VARARGS, etc.).

scripts/generate_opcode_metadata.py (2)

55-81: LGTM! Generation logic correctly parses and outputs opcode metadata.

The script properly:

  1. Reads the Rust source file
  2. Extracts the Instruction enum body
  3. Parses opcodes with their discriminants
  4. Generates sorted output in CPython 3.13 compatible format

The dynamic parsing approach eliminates the need for manual updates when opcodes change.


32-37: Python version requirement is acceptable for this development script.

This is a development-only script located in the scripts/ directory used to generate opcode metadata. The script already indicates compatibility with CPython 3.13 in its header comment, and the project's whats_left.py explicitly requires Python 3.13+. Since itertools.batched was introduced in Python 3.12, its use is fully compatible with the project's stated Python version requirements.

No action needed.

crates/jit/tests/common.rs (1)

99-105: LGTM! Test instruction handling correctly updated.

The test stack machine has been properly updated to handle the renamed instruction variants:

  • LoadNameAnyLoadName
  • StoreLocalStoreName

The logic remains the same, only the variant names changed.

crates/jit/src/instructions.rs (2)

457-476: LGTM! JIT call instruction updated.

The JIT compiler correctly handles the renamed Call instruction (previously CallFunctionPositional). The implementation logic remains unchanged.


492-545: LGTM! Comparison instruction renamed.

The CompareOp instruction (previously CompareOperation) is correctly handled in the JIT with proper type conversions and condition code mappings for both integer and float comparisons.

crates/stdlib/src/opcode.rs (1)

66-83: LGTM! Opcode name classification correctly updated.

The has_name function has been properly updated to recognize the renamed instruction variants:

  • DeleteLocalDeleteName
  • LoadNameAnyLoadName
  • StoreLocalStoreName

All three renamed variants correctly appear in the pattern match for instructions that use co_names.

crates/vm/src/frame.rs (1)

394-394: LGTM! Instruction variant renames are consistent.

The renamed instruction variants align well with standard Python bytecode naming conventions:

  • RaiseVarargs, BinarySubscr, CallKw, Call, CompareOp, DeleteName, DeleteSubscr, LoadName, PopExcept, SetupAnnotations, StoreName, and StoreSubscr are all clearer and more conventional names.

The logic remains unchanged—these are purely API surface updates.

Also applies to: 613-613, 714-718, 756-756, 825-825, 838-838, 1195-1195, 1431-1431, 1460-1460, 1562-1562, 1624-1624, 1630-1630

crates/codegen/src/compile.rs (7)

1818-1819: The kind mapping correctly covers all three raise forms (bare raise, raise exc, raise exc from cause), and the encoding aligns with the RaiseKind enum and opcode metadata. No issues found.


5222-5227: Annotation assignment, augassign, and normal call paths updated to new ops—monitor for stack effect regressions

All three locations (lines 5222–5227, 5363–5364, 6111–6115) correctly use the updated instructions (LoadName, StoreSubscr, Call, CallKw). Run the full test suite plus targeted cases (subscripting stores, __annotations__ in class/module, kwargs calls, exception handling) to verify stack effects before merge.


1345-1346: SetupAnnotations instruction verified: correctly gated and scoped

SetupAnnotations is properly emitted in all three entry points (module, function, class) using consistent gating with Self::find_ann(), which recursively detects annotated assignments. The runtime implementation correctly initializes the __annotations__ dict only when needed. No silent breakage risk identified—scopes align with Python semantics and the instruction behavior is sound.


1597-1602: LoadName/StoreName/DeleteName scope semantics are correct

LoadNameAny does not exist in the codebase. The current implementation correctly emits LoadName for NameOp::Name, which handles the proper LEGB lookup (Locals → Enclosing → Globals → Builtins): it first tries self.locals.mapping().subscript(name, vm), and on KeyError falls back to self.load_global_or_builtin(name, vm). This is the correct behavior for module and class scopes. StoreName and DeleteName similarly use locals.mapping() for storage/deletion, which is appropriate for these contexts. No behavior regression exists.


4338-4339: Pattern-matching opcode renames (BinarySubscr / CompareOp / DeleteSubscr) are properly defined and handled
All three opcodes are correctly defined in the Instruction enum (BinarySubscr = 5, DeleteSubscr = 9, CompareOp with op field), with proper VM dispatch handlers in frame.rs, correct stack effect tracking, and consistent bytecode serialization. All usages in compile.rs align with the defined opcode metadata.


497-510: Subscript opcodes are properly defined with documented stack semantics

The StoreSubscr (opcode 39, -3 stack effect), DeleteSubscr (opcode 9, -2 stack effect), and BinarySubscr (opcode 5, -1 stack effect) instructions are well-defined in crates/compiler-core/src/bytecode.rs with clear stack effect documentation. The usage patterns in compile.rs are correct: for store operations, the container and index are on the stack followed by the value; for delete operations, the container and index are consumed. No stack order issues detected.


3564-3596: Verify LoadName semantics for .type_params access in PEP 695 generic class contexts

While .type_params is registered with the TYPE_PARAM flag (making it a bound symbol), the code uses LoadName instructions to access it rather than LoadDeref. This appears to work because:

  1. In builtins.rs (line 1040), .type_params is explicitly set in the class function's namespace dictionary before execution
  2. The compile_name function has special handling for TypeParams scope that allows fallback to parent scope lookup
  3. At runtime, LoadName succeeds by finding .type_params in the namespace dict

However, this relies on runtime namespace provisioning rather than the symbol scope mechanism. Confirm that:

  • .type_params is guaranteed to be in the namespace dict in all PEP 695 execution paths
  • No optimization or refactoring removes the .type_params namespace entry before the class code executes
  • The fallback logic in compile_name consistently resolves .type_params correctly across all TypeParams scope access patterns (lines 3589-3590, 3679-3680, 3690, and others)

@youknowone youknowone left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

👍

Hide details View details @youknowone youknowone merged commit c2bfdf3 into RustPython:main Jan 9, 2026
13 checks passed
terryluan12 pushed a commit to terryluan12/RustPython that referenced this pull request Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants