◐ Shell
reader mode source ↗
Skip to content

Autogen opcodes metadata#7983

Merged
youknowone merged 4 commits into
RustPython:mainfrom
ShaharNaveh:autogen-opcodes-metadata
May 27, 2026
Merged

Autogen opcodes metadata#7983
youknowone merged 4 commits into
RustPython:mainfrom
ShaharNaveh:autogen-opcodes-metadata

Conversation

@ShaharNaveh

@ShaharNaveh ShaharNaveh commented May 26, 2026

Copy link
Copy Markdown
Contributor

Inspired by #7797 (comment)

Summary by CodeRabbit

  • Chores
    • Restructured opcode metadata generation system with improved build pipeline integration
    • Separated metadata generation into distinct development stages for better workflow efficiency
    • Updated pre-commit hook configuration for enhanced code generation processes
    • Adjusted compiler recursion limit settings

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR refactors RustPython's opcode metadata generation infrastructure. The changes migrate from a monolithic legacy approach (single generate.py script reading opcode.toml) to a modular, CPython-aware system. Macro-driven Rust instruction definitions now generate core behavior; separate Python tools produce both Rust and Python opcode metadata. CI and pre-commit workflows are updated to support the new generation pathways.

Changes

Opcode Metadata Generation Refactor

Layer / File(s) Summary
Utility Foundations and Configuration
tools/opcode_metadata/utils.py, tools/opcode_metadata/conf.toml
Stack-effect data structures, case conversion helpers (to_pascal_case, to_upper_snake_case), TOML configuration loading, and Rust enum body extraction utilities.
CPython Opcode Analysis Bridge
tools/opcode_metadata/cpython.py
Dynamically loads CPython's opcode analyzer via CPYTHON_ROOT environment variable, merges instruction and pseudo-instruction analysis, and provides unified get_analysis() entrypoint.
Rust Opcode Definition Parsing
tools/opcode_metadata/opcodes.py
Parses define_opcodes!(…) macro blocks from Rust source, extracts opcode metadata, matches against CPython analysis data, applies override configuration, and computes deoptimization targets via OpcodeInfo and Opcode dataclasses.
Rust Opcode Metadata Generation
tools/opcode_metadata/generate_rs_opcode_metadata.py
Reads parsed opcode definitions, generates Rust impl blocks with metadata accessors (numeric conversions, cache entries, flag predicates, stack-effect computation, and variant conversions), formats via rustfmt, and writes to crates/compiler-core/src/bytecode/opcode_metadata.rs.
Generated Opcode Metadata Output
crates/compiler-core/src/bytecode/opcode_metadata.rs, .gitattributes
Generated file containing Opcode and PseudoOpcode metadata methods: as_u8/as_u16, cache_entries, deopt, predicate helpers (has_arg, has_const, has_free, has_jump, has_local, has_name, is_instrumented), stack-effect computation, and variant conversions (to_base, to_instrumented, try_from_u8/try_from_u16). Marked as generated in .gitattributes.
Python Opcode Metadata Generation
tools/opcode_metadata/generate_py_opcode_metadata.py
Generates Lib/_opcode_metadata.py for CPython 3.14 compatibility by aggregating opcode definitions, building _specializations mapping, computing HAVE_ARGUMENT and MIN_INSTRUMENTED_OPCODE constants, and emitting prioritized opmap.
Instruction Module Macro Refactoring
crates/compiler-core/src/bytecode/instruction.rs
Refactors to use define_opcodes! macro for generating Opcode and PseudoOpcode enums and their conversion/predicate methods. Explicitly implements PseudoOpcode::is_scope_exit (returns false) and is_unconditional_jump (restricted to Jump/JumpNoInterrupt). Updates AnyInstruction and AnyOpcode delegation via either_real_pseudo! to use macro-generated methods. Adds Eq and PartialEq to AnyOpcode trait derives.
Bytecode Module Reorganization
crates/compiler-core/src/bytecode.rs, crates/compiler-core/src/bytecode/oparg.rs
Re-exports Instruction/Opcode/PseudoInstruction/PseudoOpcode from instruction submodule instead of separate instructions module. Declares opcode_metadata module instead of instructions. Updates oparg.rs to import Instruction from crate::bytecode directly.
Rust Compiler Configuration
crates/compiler-core/src/lib.rs
Adds #![recursion_limit = "256"] crate attribute to support macro expansion.
CI and Pre-commit Workflow Updates
.github/workflows/ci.yaml, .pre-commit-config.yaml
ci.yaml: installs prek separately, reads CPython version from .python-version, clones the matching CPython tag, and runs prek for manual hooks with CPYTHON_ROOT set. Adds explicit persist-credentials: false to checkout. .pre-commit-config.yaml: replaces single generate-opcode-metadata hook with two separate hooks (generate-rs-opcode-metadata and generate-py-opcode-metadata), each pointing to its dedicated generator script with proper serial execution and ordering.
Legacy Code Removal
Deleted: crates/compiler-core/generate.py, crates/compiler-core/opcode.toml, scripts/generate_opcode_metadata.py
Removes obsolete files: generate.py (old Rust generation with OpcodeGen and InstructionGen classes), opcode.toml (old metadata definitions), and scripts/generate_opcode_metadata.py (old Python generation).

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

  • RustPython/RustPython#7797: Earlier PR in the opcode/instruction autogeneration refactor that updates the compiler-core model and generation inputs.
  • RustPython/RustPython#7573: Overlapping changes to instruction.rs macro generation and bytecode module re-exports.
  • RustPython/RustPython#7757: Related changes to AnyInstruction/PseudoInstruction helper methods and block/pseudo-opcode identification logic.

Suggested Reviewers

  • youknowone
  • fanninpm

🐰 The rabbits hopped through fields of code,
Where opcodes once lived, now macros flow,
Generated metadata shines so bright,
Separate paths for Python and for Rust—
A refactored garden, unified and true!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title "Autogen opcodes metadata" directly describes the main change in the changeset: the PR implements autogenerated opcode metadata across the repository. The title is concise and accurately represents the primary objective.
Docstring Coverage ✅ Passed Docstring coverage is 83.78% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] lib: cpython/Lib/opcode.py
[ ] lib: cpython/Lib/_opcode_metadata.py
[x] test: cpython/Lib/test/test__opcode.py
[x] test: cpython/Lib/test/test_opcodes.py

dependencies:

  • opcode (native: _opcode, builtins)
    • _opcode_metadata
    • _opcode_metadata

dependent tests: (47 tests)

  • opcode: test__opcode test_code test_compile test_dis test_peepholer
    • dis: test_ast test_compiler_assemble test_dtrace test_fstring test_inspect test_monitoring test_opcache test_patma test_positional_only_arg test_type_cache
      • bdb: test_bdb
      • inspect: test_abc test_argparse test_asyncgen test_buffer test_builtin test_clinic test_collections test_coroutines test_decimal test_enum test_functools test_generators test_grammar test_ntpath test_operator test_posixpath test_pydoc test_signal test_sqlite3 test_traceback test_type_annotations test_types test_typing test_unittest test_yield_from test_zipimport test_zipimport_support test_zoneinfo
      • modulefinder: test_importlib test_modulefinder
      • trace: test_trace

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@ShaharNaveh ShaharNaveh marked this pull request as ready for review May 26, 2026 15:18

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 9

🧹 Nitpick comments (2)
tools/opcode_metadata/cpython.py (1)

5-8: ⚡ Quick win

Chain the exception in the except KeyError path.

Use raise ... from err to preserve the original failure context and satisfy B904.

Suggested fix
-try:
-    CPYTHON_ROOT = pathlib.Path(os.environ["CPYTHON_ROOT"]).expanduser().resolve()
-except KeyError:
-    raise ValueError("Missing environment variable 'CPYTHON_ROOT'")
+try:
+    CPYTHON_ROOT = pathlib.Path(os.environ["CPYTHON_ROOT"]).expanduser().resolve()
+except KeyError as err:
+    raise ValueError("Missing environment variable 'CPYTHON_ROOT'") from err

As per coding guidelines, "Use ruff for linting Python code".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/opcode_metadata/cpython.py` around lines 5 - 8, The except KeyError
block that raises ValueError for missing CPYTHON_ROOT should chain the original
KeyError so the original context is preserved; modify the except clause to
capture the exception (e.g., except KeyError as err) and re-raise the ValueError
using "raise ValueError(\"Missing environment variable 'CPYTHON_ROOT'\") from
err" so the CPYTHON_ROOT resolution code
(pathlib.Path(os.environ["CPYTHON_ROOT"]).expanduser().resolve()) preserves the
original error context for lint B904.
tools/opcode_metadata/opcodes.py (1)

95-98: ⚡ Quick win

Set stacklevel on warnings.warn for correct caller attribution.

This warning should point to the caller site, not inside helper internals.

Suggested fix
-            warnings.warn(
+            warnings.warn(
                 f"Could not get instruction metadata for {rust_name}"
-                " from CPython or override conf"
+                " from CPython or override conf",
+                stacklevel=2,
             )

As per coding guidelines, "Use ruff for linting Python code".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/opcode_metadata/opcodes.py` around lines 95 - 98, The warnings.warn
call that emits "Could not get instruction metadata for {rust_name} from CPython
or override conf" should include a stacklevel so the warning points at the
caller, not this helper; update the warnings.warn call to pass stacklevel=2
(e.g., warnings.warn(..., stacklevel=2)) referencing the existing rust_name
variable and the warnings.warn invocation so linting (ruff) and caller
attribution are correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.pre-commit-config.yaml:
- Around line 46-47: The hook `files` regex currently uses
`tools/opcode_metadata/*` which doesn't match files like
`tools/opcode_metadata/opcodes.py`; update both occurrences of the `files`
pattern in the pre-commit config (the lines with `files:` that currently contain
`tools/opcode_metadata/*`) to use `tools/opcode_metadata/.*` so the
opcode-metadata hooks trigger on changes under that directory (keep
`pass_filenames: false` as-is).

In `@crates/compiler-core/src/bytecode/instruction.rs`:
- Around line 213-233: The macro arm that detects label fields only matches bare
"Label" but opcode variants use "oparg::Label", so add explicit macro arms to
match oparg::Label (mirror the existing Label arms) — e.g. add (`@match`
$self:expr, $name:ident, [$variant:ident { $field:ident : oparg::Label } ,
$($rest:tt)*]) and the final-arm variant (`@match` $self:expr, $name:ident,
[$variant:ident { $field:ident : oparg::Label }]) with the same bodies that
return Some(*$field); this ensures define_opcodes! macro recognizes oparg::Label
fields and preserves CodeObject::label_targets().

In `@tools/opcode_metadata/cpython.py`:
- Around line 17-18: The module currently imports SKIP_PROPERTIES, Analysis,
Family, Properties, analyze_files and get_stack_effect only for downstream
re-export but lint flags them as unused; make this explicit by adding an __all__
=
("SKIP_PROPERTIES","Analysis","Family","Properties","analyze_files","get_stack_effect")
at top (or alternatively append targeted "# noqa: F401" to those import lines)
so the re-exports are intentional and ruff/CI no longer reports F401. Ensure the
symbol names match the imported identifiers exactly (SKIP_PROPERTIES, Analysis,
Family, Properties, analyze_files, get_stack_effect).

In `@tools/opcode_metadata/generate_py_opcode_metadata.py`:
- Around line 12-15: Remove the unused imports that trigger F401 by deleting
"typing" from the top-level imports and removing "to_pascal_case" from the
from-utils import list in generate_py_opcode_metadata.py; update the import line
to only keep used symbols (OpcodeInfo, DEFAULT_INPUT, ROOT, get_conf,
to_upper_snake_case) so the module passes ruff/flake8 linting.
- Around line 84-85: Replace the lambda assigned to key with a small local def
to satisfy E731: create a function (e.g., def sort_key(op): return
(op.cpython_name not in PRIORITY_OPMAP, op.id)) and pass sort_key into
sorted(opcodes, key=sort_key) instead of using key = lambda ...; keep references
to PRIORITY_OPMAP and op.id/op.cpython_name unchanged.

In `@tools/opcode_metadata/generate_rs_opcode_metadata.py`:
- Line 23: The name OpcodeDef used as a type annotation is undefined; either
import the correct type that represents entries returned by
OpcodeInfo.iter_infos (e.g., add an import for OpcodeDef from the module that
defines it) or change the annotation to the actual type produced by
OpcodeInfo.iter_infos (for example Iterator[OpcodeInfo] or list[OpcodeInfo]) and
add the necessary typing imports (from typing import Iterator, List) so
ruff/F821 stops complaining; update the annotation on the variable/parameter
named info (or wherever OpcodeDef is used) accordingly.
- Around line 4-16: Remove the unused imports in the top import block: delete
"collections", "os", "sys", "typing", "tomllib" (if not used elsewhere), and the
unused names from the cpython from-import ("Analysis", "get_analysis",
"get_stack_effect"), and remove "to_pascal_case" from the utils import; keep
only the actually used symbols (e.g., dataclasses, io, pathlib, subprocess,
OpcodeInfo, DEFAULT_INPUT, ROOT, get_conf). Update the import lines in
generate_rs_opcode_metadata.py accordingly and run ruff to verify no F401s
remain.

In `@tools/opcode_metadata/opcodes.py`:
- Around line 35-37: The comparison that skips the family root uses raw
member.name against the normalized family_name, causing mismatches; update the
filter to compare the normalized member_name (the result of
to_pascal_case(member.name)) to family_name (or normalize family_name with
to_pascal_case) so the family opcode is correctly excluded—adjust the logic
around member_name, member.name, family_name and the call to to_pascal_case in
opcodes.py.

In `@tools/opcode_metadata/utils.py`:
- Line 25: Replace the PEP 695 type-alias statement so it parses on Python 3.11:
change the declaration "type OverrideConfs = dict[str, Override]" to an
assignment-style alias "OverrideConfs = dict[str, Override]" (keep the same name
OverrideConfs and the referenced Override type), ensuring any necessary imports
for Override remain unchanged.

---

Nitpick comments:
In `@tools/opcode_metadata/cpython.py`:
- Around line 5-8: The except KeyError block that raises ValueError for missing
CPYTHON_ROOT should chain the original KeyError so the original context is
preserved; modify the except clause to capture the exception (e.g., except
KeyError as err) and re-raise the ValueError using "raise ValueError(\"Missing
environment variable 'CPYTHON_ROOT'\") from err" so the CPYTHON_ROOT resolution
code (pathlib.Path(os.environ["CPYTHON_ROOT"]).expanduser().resolve()) preserves
the original error context for lint B904.

In `@tools/opcode_metadata/opcodes.py`:
- Around line 95-98: The warnings.warn call that emits "Could not get
instruction metadata for {rust_name} from CPython or override conf" should
include a stacklevel so the warning points at the caller, not this helper;
update the warnings.warn call to pass stacklevel=2 (e.g., warnings.warn(...,
stacklevel=2)) referencing the existing rust_name variable and the warnings.warn
invocation so linting (ruff) and caller attribution are correct.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 3d449e36-7fc4-4c30-a9b0-afbda43241c9

📥 Commits

Reviewing files that changed from the base of the PR and between dcb273b and f9a754c.

⛔ Files ignored due to path filters (1)
  • Lib/_opcode_metadata.py is excluded by !Lib/**
📒 Files selected for processing (18)
  • .gitattributes
  • .github/workflows/ci.yaml
  • .pre-commit-config.yaml
  • crates/compiler-core/generate.py
  • crates/compiler-core/opcode.toml
  • crates/compiler-core/src/bytecode.rs
  • crates/compiler-core/src/bytecode/instruction.rs
  • crates/compiler-core/src/bytecode/instructions.rs
  • crates/compiler-core/src/bytecode/oparg.rs
  • crates/compiler-core/src/bytecode/opcode_metadata.rs
  • crates/compiler-core/src/lib.rs
  • scripts/generate_opcode_metadata.py
  • tools/opcode_metadata/conf.toml
  • tools/opcode_metadata/cpython.py
  • tools/opcode_metadata/generate_py_opcode_metadata.py
  • tools/opcode_metadata/generate_rs_opcode_metadata.py
  • tools/opcode_metadata/opcodes.py
  • tools/opcode_metadata/utils.py
💤 Files with no reviewable changes (3)
  • scripts/generate_opcode_metadata.py
  • crates/compiler-core/opcode.toml
  • crates/compiler-core/generate.py

Hide details View details @youknowone youknowone merged commit f95b746 into RustPython:main May 27, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants