Tighten specialization guards and add send_none fastpath#7359

youknowone

Summary

Add send_none fastpath in Coro to skip the is_none check and type-error branch for the common case (generator iteration via for loops)
Tighten CALL_METHOD_DESCRIPTOR_* guards to verify self type matches descr.objclass, matching CPython's CALL_METHOD_DESCRIPTOR type checks
Allow list.append on subclasses in CALL_LIST_APPEND (use downcast_ref instead of downcast_ref_if_exact)
Add specialization_eval_frame_active guard to prevent specializing CALL/CALL_KW/SEND/FOR_ITER_GEN when custom eval frames are active
Fix FOR_ITER_GEN to deoptimize on guard miss and check running() state before send_none
Fix SEND_GEN to deoptimize on guard miss and add send_none fastpath for None values
Fix specialize_send to require exact generator/coroutine types
Fix specialize_for_iter to validate END_FOR shape and jump delta bounds for generator specialization
Route CALL_METHOD_DESCRIPTOR_FAST through CALL_METHOD_DESCRIPTOR_FAST_WITH_KEYWORDS for correct nargs > 1 dispatch
Add CallBuiltinFastWithKeywords fallback for builtin calls with effective_nargs > 1
Skip CALL/CALL_KW specialization for PyFunction when eval frame hook is active

Test plan

CI passes on all platforms (Ubuntu, macOS, Windows)
test_bdb generator stepping works correctly with ForIterGen deoptimize path
test_generators passes with send_none fastpath

🤖 Generated with Claude Code

Summary by CodeRabbit

Refactor
- Centralized coroutine send/result handling for more consistent iteration/stop error mapping and cleaner generator closure behavior.
- Made execution-frame specialization VM-aware to improve optimization decisions and safer fallback behavior during hot paths.
Behavior
- Improved native method call flag inference and method flag availability, yielding more accurate method dispatch semantics (affecting native extensions and method calling behavior).

coderabbitai

📝 Walkthrough

Walkthrough

Refactors coroutine send logic by centralizing post-run handling into finalize_send_result and adding send_none; simultaneously threads vm: &VirtualMachine into multiple frame specialization entry points and adds helpers to guide fast-path decisions. Additional changes infer and propagate native call flags and re-enable method flag constants.

Changes

Cohort / File(s)	Summary
Coroutine send refactor `crates/vm/src/coroutine.rs`	Added `finalize_send_result` (private) to centralize post-run handling (StopIteration/StopAsyncIteration translation, generator closure via `maybe_close`); added `pub(crate) send_none`; updated `send` to delegate through new flow.
Frame specialization (VM-aware) `crates/vm/src/frame.rs`	Threaded `vm: &VirtualMachine` into many specialization entry points (`specialize_send`, `specialize_for_iter`, `specialize_call`, `specialize_call_kw`, etc.), added `specialization_eval_frame_active` and `for_iter_has_end_for_shape`, and expanded fast-path checks/deopt paths.
derive impl — native call flags `crates/derive-impl/src/util.rs`, `crates/derive-impl/src/pyclass.rs`, `crates/derive-impl/src/pymodule.rs`	Added `infer_native_call_flags` and propagated computed `call_flags` through Function/Method nursery items and codegen so generated `PyMethodDef` entries include inferred METH_* flags.
Method flag constants enabled `crates/vm/src/function/method.rs`	Re-enabled/added `PyMethodFlags` constants (`VARARGS`, `KEYWORDS`, `NOARGS`, `O`, and `FASTCALL` under limited API), restoring concrete flag bits used for native method definitions.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Coroutine as Coroutine (send)
    participant Frame as ExecutingFrame (run_with_context)
    participant VM as VirtualMachine
    participant Finalize as finalize_send_result

    Caller->>Coroutine: send(value)
    alt coroutine just-started
        Coroutine->>Coroutine: compute value = None
    else already started
        Coroutine->>Coroutine: use provided value
    end
    Coroutine->>Frame: run_with_context(vm, computed_value)
    Frame->>VM: execute frame
    VM-->>Frame: ExecutionResult
    Frame-->>Coroutine: ExecutionResult
    Coroutine->>Finalize: finalize_send_result(jen, result, vm)
    Finalize->>Finalize: map StopIteration / StopAsyncIteration
    Finalize->>Coroutine: maybe_close frame if final
    Finalize-->>Caller: PyIterReturn / Err

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

RustPython#5842: Modifies coroutine send paths and exception-cause handling, touching related send/return behavior.

Suggested reviewers

ShaharNaveh
arihant2math

Poem

🐰 I hopped through frames to tidy the send,
centralized the end so errors mend,
VM now whispers to specialization lanes,
flags bloom in methods, and code reclaims its reins.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly reflects the main objectives: adding a send_none fastpath for generators and tightening specialization guards for bytecode optimization.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

crates/vm/src/frame.rs (1)

3160-3164: SendGen can use send_none for the common None case.

Instruction::Send already does this, but Instruction::SendGen still always calls send. Aligning both paths keeps the intended fastpath consistent.

Suggested patch

-                    match coro.send(receiver, val, vm)? {
+                    let ret = if vm.is_none(&val) {
+                        coro.send_none(receiver, vm)?
+                    } else {
+                        coro.send(receiver, val, vm)?
+                    };
+                    match ret {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 3160 - 3164, The SendGen path currently
calls coro.send(...) even when val is None; change it to use the fastpath method
coro.send_none(...) (or send_none) when val equals PyNone to match
Instruction::Send behavior—locate the block gated by can_fast_send in the
SendGen handling (symbols: SendGen, can_fast_send, top_value, builtin_coro,
send) and replace the call to coro.send(receiver, val, vm) with a conditional
that invokes coro.send_none(receiver, vm) for the None case and falls back to
coro.send(...) otherwise, preserving the existing PyIterReturn handling and
error propagation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/frame.rs`:
- Around line 8145-8148: The specialization_eval_frame_active method currently
returns false unconditionally, disabling frame-aware specialization guards; fix
it by reading the VM tracing flags instead of returning false: in
specialization_eval_frame_active(&self, vm: &VirtualMachine) check
vm.trace_func, vm.profile_func, and vm.use_tracing (matching the pattern used in
vm/mod.rs around the other checks) and return true when any of those indicate
tracing/profile is active so frame-aware specialization is disabled when tracing
or profiling is enabled.

---

Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Around line 3160-3164: The SendGen path currently calls coro.send(...) even
when val is None; change it to use the fastpath method coro.send_none(...) (or
send_none) when val equals PyNone to match Instruction::Send behavior—locate the
block gated by can_fast_send in the SendGen handling (symbols: SendGen,
can_fast_send, top_value, builtin_coro, send) and replace the call to
coro.send(receiver, val, vm) with a conditional that invokes
coro.send_none(receiver, vm) for the None case and falls back to coro.send(...)
otherwise, preserving the existing PyIterReturn handling and error propagation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 2278b9c7-6e88-4782-9d3f-ed74f4874b53

📥 Commits

Reviewing files that changed from the base of the PR and between 375b547 and 3e92a7a.

📒 Files selected for processing (2)

crates/vm/src/coroutine.rs
crates/vm/src/frame.rs

- Implement specialization_eval_frame_active to check vm.use_tracing so specializations are skipped when tracing/profiling is active - Add send_none fastpath in SendGen handler for the common None case

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

crates/vm/src/coroutine.rs (1)

146-182: Deduplicate send_none/send shared execution path.

Both functions duplicate the same closed-check / frame-prepare / resume / finalize flow and only differ in value preparation.

♻️ Proposed refactor

+    fn send_impl<F>(
+        &self,
+        jen: &PyObject,
+        vm: &VirtualMachine,
+        value_for_state: F,
+    ) -> PyResult<PyIterReturn>
+    where
+        F: FnOnce(bool) -> PyResult<Option<PyObjectRef>>,
+    {
+        if self.closed.load() {
+            return Ok(PyIterReturn::StopIteration(None));
+        }
+        let started = self.frame.lasti() > 0;
+        let value = value_for_state(started)?;
+        self.frame.locals_to_fast(vm)?;
+        let result = self.run_with_context(jen, vm, |f| f.resume(value, vm));
+        self.finalize_send_result(jen, result, vm)
+    }
+
     pub(crate) fn send_none(&self, jen: &PyObject, vm: &VirtualMachine) -> PyResult<PyIterReturn> {
-        if self.closed.load() {
-            return Ok(PyIterReturn::StopIteration(None));
-        }
-        self.frame.locals_to_fast(vm)?;
-        let value = if self.frame.lasti() > 0 {
-            Some(vm.ctx.none())
-        } else {
-            None
-        };
-        let result = self.run_with_context(jen, vm, |f| f.resume(value, vm));
-        self.finalize_send_result(jen, result, vm)
+        self.send_impl(jen, vm, |started| {
+            Ok(if started { Some(vm.ctx.none()) } else { None })
+        })
     }
 
     pub fn send(
         &self,
         jen: &PyObject,
         value: PyObjectRef,
         vm: &VirtualMachine,
     ) -> PyResult<PyIterReturn> {
-        if self.closed.load() {
-            return Ok(PyIterReturn::StopIteration(None));
-        }
-        self.frame.locals_to_fast(vm)?;
-        let value = if self.frame.lasti() > 0 {
-            Some(value)
-        } else if !vm.is_none(&value) {
-            return Err(vm.new_type_error(format!(
-                "can't send non-None value to a just-started {}",
-                gen_name(jen, vm),
-            )));
-        } else {
-            None
-        };
-        let result = self.run_with_context(jen, vm, |f| f.resume(value, vm));
-        self.finalize_send_result(jen, result, vm)
+        self.send_impl(jen, vm, |started| {
+            if started {
+                Ok(Some(value))
+            } else if !vm.is_none(&value) {
+                Err(vm.new_type_error(format!(
+                    "can't send non-None value to a just-started {}",
+                    gen_name(jen, vm),
+                )))
+            } else {
+                Ok(None)
+            }
+        })
     }

As per coding guidelines "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/coroutine.rs` around lines 146 - 182, Both send_none and send
duplicate the same closed-check, frame.locals_to_fast call, resume via
run_with_context, and finalize_send_result logic; extract that shared flow into
a helper (e.g., send_common or resume_with_value) that accepts the prepared
Option<PyObjectRef> (or a closure producing it) and performs closed.load check,
self.frame.locals_to_fast(vm)?, calls run_with_context(jen, vm, |f|
f.resume(value, vm)), and returns finalize_send_result; then simplify send_none
to compute its value (using frame.lasti() and vm.ctx.none()) and call the
helper, and simplify send to compute its value (respecting frame.lasti() and the
non-None error) and call the same helper, keeping all original calls to
frame.lasti, run_with_context, finalize_send_result, and closed.load.

crates/vm/src/frame.rs (1)

7837-7861: Reduce duplicated guard/backoff branches in descriptor call-conv selection.

The NOARGS and O branches duplicate the same backoff/return structure with only expected-arity differing. Extract expected arity first, then run one shared guard+backoff path.

As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 7837 - 7861, The NOARGS and O branches
in the descriptor call-conv selection duplicate the same guard/backoff/return
logic; refactor by computing the expected arity from call_conv (e.g., map
PyMethodFlags::NOARGS -> 0, PyMethodFlags::O -> 1) and choose the Instruction
variant after the check, then perform a single shared guard that compares nargs
to expected_arity and on mismatch runs the adaptive backoff using
self.code.instructions.read_adaptive_counter(cache_base) and
write_adaptive_counter(...bytecode::adaptive_counter_backoff(...)) and returns;
update references to new_op, call_conv, nargs, cache_base, and the Instruction
variants (CallMethodDescriptorNoargs, etc.) accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/derive-impl/src/pyclass.rs`:
- Around line 1018-1028: The drop_first_typed logic incorrectly ignores the
parsed raw flag, causing methods marked #[pymethod(raw)] or
#[pyclassmethod(raw)] without a receiver to drop the first typed argument;
update the match that sets drop_first_typed (which currently checks
self.inner.attr_name and has_receiver) to also require !raw before returning 1
so that raw methods do not decrement the first typed arg, then pass the
unchanged drop_first_typed into infer_native_call_flags(func.sig(),
drop_first_typed).

---

Nitpick comments:
In `@crates/vm/src/coroutine.rs`:
- Around line 146-182: Both send_none and send duplicate the same closed-check,
frame.locals_to_fast call, resume via run_with_context, and finalize_send_result
logic; extract that shared flow into a helper (e.g., send_common or
resume_with_value) that accepts the prepared Option<PyObjectRef> (or a closure
producing it) and performs closed.load check, self.frame.locals_to_fast(vm)?,
calls run_with_context(jen, vm, |f| f.resume(value, vm)), and returns
finalize_send_result; then simplify send_none to compute its value (using
frame.lasti() and vm.ctx.none()) and call the helper, and simplify send to
compute its value (respecting frame.lasti() and the non-None error) and call the
same helper, keeping all original calls to frame.lasti, run_with_context,
finalize_send_result, and closed.load.

In `@crates/vm/src/frame.rs`:
- Around line 7837-7861: The NOARGS and O branches in the descriptor call-conv
selection duplicate the same guard/backoff/return logic; refactor by computing
the expected arity from call_conv (e.g., map PyMethodFlags::NOARGS -> 0,
PyMethodFlags::O -> 1) and choose the Instruction variant after the check, then
perform a single shared guard that compares nargs to expected_arity and on
mismatch runs the adaptive backoff using
self.code.instructions.read_adaptive_counter(cache_base) and
write_adaptive_counter(...bytecode::adaptive_counter_backoff(...)) and returns;
update references to new_op, call_conv, nargs, cache_base, and the Instruction
variants (CallMethodDescriptorNoargs, etc.) accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: a11e8867-e0c0-4327-8d74-a8a000cacd05

📥 Commits

Reviewing files that changed from the base of the PR and between 3e92a7a and c7d095a.

📒 Files selected for processing (6)

crates/derive-impl/src/pyclass.rs
crates/derive-impl/src/pymodule.rs
crates/derive-impl/src/util.rs
crates/vm/src/coroutine.rs
crates/vm/src/frame.rs
crates/vm/src/function/method.rs

…7359) * vm: align specialization guards with CPython patterns * vm: tighten call specialization runtime guards * vm: add send_none fastpath for generator specialization * vm: restrict method-descriptor specialization to methods * vm: deopt call specializations on guard misses * vm: match CPython send/for-iter closed-frame guards * vm: restrict len/isinstance specialization to builtins * vm: use exact-type guards for call specializations * vm: align class-call specialization flow with CPython * vm: prefer FAST call opcodes for positional builtin calls * vm: add callable identity guard to CALL_LIST_APPEND * vm: make CALL_LIST_APPEND runtime guard pointer-based * vm: align call guard cache and fallback behavior with CPython * vm: use base vectorcall fallback for EXIT-style call misses * vm: simplify CALL_LEN/CALL_ISINSTANCE runtime guards * vm: infer call-convention flags for CPython-style CALL specialization * vm: check use_tracing in eval_frame_active, add SendGen send_none - Implement specialization_eval_frame_active to check vm.use_tracing so specializations are skipped when tracing/profiling is active - Add send_none fastpath in SendGen handler for the common None case

coderabbitai Bot reviewed Mar 5, 2026

View reviewed changes

youknowone added 17 commits March 5, 2026 23:04

vm: align specialization guards with CPython patterns

e4abe4a

vm: tighten call specialization runtime guards

1d69c5b

vm: add send_none fastpath for generator specialization

f0a235e

vm: restrict method-descriptor specialization to methods

e518287

vm: deopt call specializations on guard misses

01cdfb5

vm: match CPython send/for-iter closed-frame guards

690921a

vm: restrict len/isinstance specialization to builtins

ed4b7fa

vm: use exact-type guards for call specializations

78dbbb0

vm: align class-call specialization flow with CPython

01a4c27

vm: prefer FAST call opcodes for positional builtin calls

397143b

vm: add callable identity guard to CALL_LIST_APPEND

41e58d3

vm: make CALL_LIST_APPEND runtime guard pointer-based

c694fe1

vm: align call guard cache and fallback behavior with CPython

ab82f5b

vm: use base vectorcall fallback for EXIT-style call misses

f630d01

vm: simplify CALL_LEN/CALL_ISINSTANCE runtime guards

eeff412

vm: infer call-convention flags for CPython-style CALL specialization

2a23d3b

vm: check use_tracing in eval_frame_active, add SendGen send_none …

c7d095a

- Implement specialization_eval_frame_active to check vm.use_tracing so specializations are skipped when tracing/profiling is active - Add send_none fastpath in SendGen handler for the common None case

youknowone force-pushed the specialization branch from 3e92a7a to c7d095a Compare March 5, 2026 14:06

coderabbitai Bot reviewed Mar 5, 2026

View reviewed changes

youknowone deleted the specialization branch March 5, 2026 15:59

coderabbitai Bot mentioned this pull request Mar 5, 2026

Harden CALL specialization guards and cache callables #7360

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tighten specialization guards and add send_none fastpath#7359