◐ Shell
reader mode source ↗
Skip to content

Tighten specialization guards and add send_none fastpath#7359

Merged
youknowone merged 17 commits into
RustPython:mainfrom
youknowone:specialization
Mar 5, 2026
Merged

Tighten specialization guards and add send_none fastpath#7359
youknowone merged 17 commits into
RustPython:mainfrom
youknowone:specialization

Conversation

@youknowone

@youknowone youknowone commented Mar 5, 2026

Copy link
Copy Markdown
Member

Summary

  • Add send_none fastpath in Coro to skip the is_none check and type-error branch for the common case (generator iteration via for loops)
  • Tighten CALL_METHOD_DESCRIPTOR_* guards to verify self type matches descr.objclass, matching CPython's CALL_METHOD_DESCRIPTOR type checks
  • Allow list.append on subclasses in CALL_LIST_APPEND (use downcast_ref instead of downcast_ref_if_exact)
  • Add specialization_eval_frame_active guard to prevent specializing CALL/CALL_KW/SEND/FOR_ITER_GEN when custom eval frames are active
  • Fix FOR_ITER_GEN to deoptimize on guard miss and check running() state before send_none
  • Fix SEND_GEN to deoptimize on guard miss and add send_none fastpath for None values
  • Fix specialize_send to require exact generator/coroutine types
  • Fix specialize_for_iter to validate END_FOR shape and jump delta bounds for generator specialization
  • Route CALL_METHOD_DESCRIPTOR_FAST through CALL_METHOD_DESCRIPTOR_FAST_WITH_KEYWORDS for correct nargs > 1 dispatch
  • Add CallBuiltinFastWithKeywords fallback for builtin calls with effective_nargs > 1
  • Skip CALL/CALL_KW specialization for PyFunction when eval frame hook is active

Test plan

  • CI passes on all platforms (Ubuntu, macOS, Windows)
  • test_bdb generator stepping works correctly with ForIterGen deoptimize path
  • test_generators passes with send_none fastpath

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactor
    • Centralized coroutine send/result handling for more consistent iteration/stop error mapping and cleaner generator closure behavior.
    • Made execution-frame specialization VM-aware to improve optimization decisions and safer fallback behavior during hot paths.
  • Behavior
    • Improved native method call flag inference and method flag availability, yielding more accurate method dispatch semantics (affecting native extensions and method calling behavior).

@coderabbitai

coderabbitai Bot commented Mar 5, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Refactors coroutine send logic by centralizing post-run handling into finalize_send_result and adding send_none; simultaneously threads vm: &VirtualMachine into multiple frame specialization entry points and adds helpers to guide fast-path decisions. Additional changes infer and propagate native call flags and re-enable method flag constants.

Changes

Cohort / File(s) Summary
Coroutine send refactor
crates/vm/src/coroutine.rs
Added finalize_send_result (private) to centralize post-run handling (StopIteration/StopAsyncIteration translation, generator closure via maybe_close); added pub(crate) send_none; updated send to delegate through new flow.
Frame specialization (VM-aware)
crates/vm/src/frame.rs
Threaded vm: &VirtualMachine into many specialization entry points (specialize_send, specialize_for_iter, specialize_call, specialize_call_kw, etc.), added specialization_eval_frame_active and for_iter_has_end_for_shape, and expanded fast-path checks/deopt paths.
derive impl — native call flags
crates/derive-impl/src/util.rs, crates/derive-impl/src/pyclass.rs, crates/derive-impl/src/pymodule.rs
Added infer_native_call_flags and propagated computed call_flags through Function/Method nursery items and codegen so generated PyMethodDef entries include inferred METH_* flags.
Method flag constants enabled
crates/vm/src/function/method.rs
Re-enabled/added PyMethodFlags constants (VARARGS, KEYWORDS, NOARGS, O, and FASTCALL under limited API), restoring concrete flag bits used for native method definitions.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Coroutine as Coroutine (send)
    participant Frame as ExecutingFrame (run_with_context)
    participant VM as VirtualMachine
    participant Finalize as finalize_send_result

    Caller->>Coroutine: send(value)
    alt coroutine just-started
        Coroutine->>Coroutine: compute value = None
    else already started
        Coroutine->>Coroutine: use provided value
    end
    Coroutine->>Frame: run_with_context(vm, computed_value)
    Frame->>VM: execute frame
    VM-->>Frame: ExecutionResult
    Frame-->>Coroutine: ExecutionResult
    Coroutine->>Finalize: finalize_send_result(jen, result, vm)
    Finalize->>Finalize: map StopIteration / StopAsyncIteration
    Finalize->>Coroutine: maybe_close frame if final
    Finalize-->>Caller: PyIterReturn / Err
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • RustPython#5842: Modifies coroutine send paths and exception-cause handling, touching related send/return behavior.

Suggested reviewers

  • ShaharNaveh
  • arihant2math

Poem

🐰 I hopped through frames to tidy the send,
centralized the end so errors mend,
VM now whispers to specialization lanes,
flags bloom in methods, and code reclaims its reins.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly reflects the main objectives: adding a send_none fastpath for generators and tightening specialization guards for bytecode optimization.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/vm/src/frame.rs (1)

3160-3164: SendGen can use send_none for the common None case.

Instruction::Send already does this, but Instruction::SendGen still always calls send. Aligning both paths keeps the intended fastpath consistent.

Suggested patch
-                    match coro.send(receiver, val, vm)? {
+                    let ret = if vm.is_none(&val) {
+                        coro.send_none(receiver, vm)?
+                    } else {
+                        coro.send(receiver, val, vm)?
+                    };
+                    match ret {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 3160 - 3164, The SendGen path currently
calls coro.send(...) even when val is None; change it to use the fastpath method
coro.send_none(...) (or send_none) when val equals PyNone to match
Instruction::Send behavior—locate the block gated by can_fast_send in the
SendGen handling (symbols: SendGen, can_fast_send, top_value, builtin_coro,
send) and replace the call to coro.send(receiver, val, vm) with a conditional
that invokes coro.send_none(receiver, vm) for the None case and falls back to
coro.send(...) otherwise, preserving the existing PyIterReturn handling and
error propagation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/frame.rs`:
- Around line 8145-8148: The specialization_eval_frame_active method currently
returns false unconditionally, disabling frame-aware specialization guards; fix
it by reading the VM tracing flags instead of returning false: in
specialization_eval_frame_active(&self, vm: &VirtualMachine) check
vm.trace_func, vm.profile_func, and vm.use_tracing (matching the pattern used in
vm/mod.rs around the other checks) and return true when any of those indicate
tracing/profile is active so frame-aware specialization is disabled when tracing
or profiling is enabled.

---

Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Around line 3160-3164: The SendGen path currently calls coro.send(...) even
when val is None; change it to use the fastpath method coro.send_none(...) (or
send_none) when val equals PyNone to match Instruction::Send behavior—locate the
block gated by can_fast_send in the SendGen handling (symbols: SendGen,
can_fast_send, top_value, builtin_coro, send) and replace the call to
coro.send(receiver, val, vm) with a conditional that invokes
coro.send_none(receiver, vm) for the None case and falls back to coro.send(...)
otherwise, preserving the existing PyIterReturn handling and error propagation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 2278b9c7-6e88-4782-9d3f-ed74f4874b53

📥 Commits

Reviewing files that changed from the base of the PR and between 375b547 and 3e92a7a.

📒 Files selected for processing (2)
  • crates/vm/src/coroutine.rs
  • crates/vm/src/frame.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 1

🧹 Nitpick comments (2)
crates/vm/src/coroutine.rs (1)

146-182: Deduplicate send_none/send shared execution path.

Both functions duplicate the same closed-check / frame-prepare / resume / finalize flow and only differ in value preparation.

♻️ Proposed refactor
+    fn send_impl<F>(
+        &self,
+        jen: &PyObject,
+        vm: &VirtualMachine,
+        value_for_state: F,
+    ) -> PyResult<PyIterReturn>
+    where
+        F: FnOnce(bool) -> PyResult<Option<PyObjectRef>>,
+    {
+        if self.closed.load() {
+            return Ok(PyIterReturn::StopIteration(None));
+        }
+        let started = self.frame.lasti() > 0;
+        let value = value_for_state(started)?;
+        self.frame.locals_to_fast(vm)?;
+        let result = self.run_with_context(jen, vm, |f| f.resume(value, vm));
+        self.finalize_send_result(jen, result, vm)
+    }
+
     pub(crate) fn send_none(&self, jen: &PyObject, vm: &VirtualMachine) -> PyResult<PyIterReturn> {
-        if self.closed.load() {
-            return Ok(PyIterReturn::StopIteration(None));
-        }
-        self.frame.locals_to_fast(vm)?;
-        let value = if self.frame.lasti() > 0 {
-            Some(vm.ctx.none())
-        } else {
-            None
-        };
-        let result = self.run_with_context(jen, vm, |f| f.resume(value, vm));
-        self.finalize_send_result(jen, result, vm)
+        self.send_impl(jen, vm, |started| {
+            Ok(if started { Some(vm.ctx.none()) } else { None })
+        })
     }
 
     pub fn send(
         &self,
         jen: &PyObject,
         value: PyObjectRef,
         vm: &VirtualMachine,
     ) -> PyResult<PyIterReturn> {
-        if self.closed.load() {
-            return Ok(PyIterReturn::StopIteration(None));
-        }
-        self.frame.locals_to_fast(vm)?;
-        let value = if self.frame.lasti() > 0 {
-            Some(value)
-        } else if !vm.is_none(&value) {
-            return Err(vm.new_type_error(format!(
-                "can't send non-None value to a just-started {}",
-                gen_name(jen, vm),
-            )));
-        } else {
-            None
-        };
-        let result = self.run_with_context(jen, vm, |f| f.resume(value, vm));
-        self.finalize_send_result(jen, result, vm)
+        self.send_impl(jen, vm, |started| {
+            if started {
+                Ok(Some(value))
+            } else if !vm.is_none(&value) {
+                Err(vm.new_type_error(format!(
+                    "can't send non-None value to a just-started {}",
+                    gen_name(jen, vm),
+                )))
+            } else {
+                Ok(None)
+            }
+        })
     }

As per coding guidelines "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/coroutine.rs` around lines 146 - 182, Both send_none and send
duplicate the same closed-check, frame.locals_to_fast call, resume via
run_with_context, and finalize_send_result logic; extract that shared flow into
a helper (e.g., send_common or resume_with_value) that accepts the prepared
Option<PyObjectRef> (or a closure producing it) and performs closed.load check,
self.frame.locals_to_fast(vm)?, calls run_with_context(jen, vm, |f|
f.resume(value, vm)), and returns finalize_send_result; then simplify send_none
to compute its value (using frame.lasti() and vm.ctx.none()) and call the
helper, and simplify send to compute its value (respecting frame.lasti() and the
non-None error) and call the same helper, keeping all original calls to
frame.lasti, run_with_context, finalize_send_result, and closed.load.
crates/vm/src/frame.rs (1)

7837-7861: Reduce duplicated guard/backoff branches in descriptor call-conv selection.

The NOARGS and O branches duplicate the same backoff/return structure with only expected-arity differing. Extract expected arity first, then run one shared guard+backoff path.

As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 7837 - 7861, The NOARGS and O branches
in the descriptor call-conv selection duplicate the same guard/backoff/return
logic; refactor by computing the expected arity from call_conv (e.g., map
PyMethodFlags::NOARGS -> 0, PyMethodFlags::O -> 1) and choose the Instruction
variant after the check, then perform a single shared guard that compares nargs
to expected_arity and on mismatch runs the adaptive backoff using
self.code.instructions.read_adaptive_counter(cache_base) and
write_adaptive_counter(...bytecode::adaptive_counter_backoff(...)) and returns;
update references to new_op, call_conv, nargs, cache_base, and the Instruction
variants (CallMethodDescriptorNoargs, etc.) accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/derive-impl/src/pyclass.rs`:
- Around line 1018-1028: The drop_first_typed logic incorrectly ignores the
parsed raw flag, causing methods marked #[pymethod(raw)] or
#[pyclassmethod(raw)] without a receiver to drop the first typed argument;
update the match that sets drop_first_typed (which currently checks
self.inner.attr_name and has_receiver) to also require !raw before returning 1
so that raw methods do not decrement the first typed arg, then pass the
unchanged drop_first_typed into infer_native_call_flags(func.sig(),
drop_first_typed).

---

Nitpick comments:
In `@crates/vm/src/coroutine.rs`:
- Around line 146-182: Both send_none and send duplicate the same closed-check,
frame.locals_to_fast call, resume via run_with_context, and finalize_send_result
logic; extract that shared flow into a helper (e.g., send_common or
resume_with_value) that accepts the prepared Option<PyObjectRef> (or a closure
producing it) and performs closed.load check, self.frame.locals_to_fast(vm)?,
calls run_with_context(jen, vm, |f| f.resume(value, vm)), and returns
finalize_send_result; then simplify send_none to compute its value (using
frame.lasti() and vm.ctx.none()) and call the helper, and simplify send to
compute its value (respecting frame.lasti() and the non-None error) and call the
same helper, keeping all original calls to frame.lasti, run_with_context,
finalize_send_result, and closed.load.

In `@crates/vm/src/frame.rs`:
- Around line 7837-7861: The NOARGS and O branches in the descriptor call-conv
selection duplicate the same guard/backoff/return logic; refactor by computing
the expected arity from call_conv (e.g., map PyMethodFlags::NOARGS -> 0,
PyMethodFlags::O -> 1) and choose the Instruction variant after the check, then
perform a single shared guard that compares nargs to expected_arity and on
mismatch runs the adaptive backoff using
self.code.instructions.read_adaptive_counter(cache_base) and
write_adaptive_counter(...bytecode::adaptive_counter_backoff(...)) and returns;
update references to new_op, call_conv, nargs, cache_base, and the Instruction
variants (CallMethodDescriptorNoargs, etc.) accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: a11e8867-e0c0-4327-8d74-a8a000cacd05

📥 Commits

Reviewing files that changed from the base of the PR and between 3e92a7a and c7d095a.

📒 Files selected for processing (6)
  • crates/derive-impl/src/pyclass.rs
  • crates/derive-impl/src/pymodule.rs
  • crates/derive-impl/src/util.rs
  • crates/vm/src/coroutine.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/function/method.rs

Hide details View details @youknowone youknowone merged commit 7620c28 into RustPython:main Mar 5, 2026
13 checks passed
@youknowone youknowone deleted the specialization branch March 5, 2026 15:59
youknowone added a commit to youknowone/RustPython that referenced this pull request Mar 22, 2026
…7359)

* vm: align specialization guards with CPython patterns

* vm: tighten call specialization runtime guards

* vm: add send_none fastpath for generator specialization

* vm: restrict method-descriptor specialization to methods

* vm: deopt call specializations on guard misses

* vm: match CPython send/for-iter closed-frame guards

* vm: restrict len/isinstance specialization to builtins

* vm: use exact-type guards for call specializations

* vm: align class-call specialization flow with CPython

* vm: prefer FAST call opcodes for positional builtin calls

* vm: add callable identity guard to CALL_LIST_APPEND

* vm: make CALL_LIST_APPEND runtime guard pointer-based

* vm: align call guard cache and fallback behavior with CPython

* vm: use base vectorcall fallback for EXIT-style call misses

* vm: simplify CALL_LEN/CALL_ISINSTANCE runtime guards

* vm: infer call-convention flags for CPython-style CALL specialization

* vm: check use_tracing in eval_frame_active, add SendGen send_none

- Implement specialization_eval_frame_active to check vm.use_tracing
  so specializations are skipped when tracing/profiling is active
- Add send_none fastpath in SendGen handler for the common None case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant