◐ Shell
reader mode source ↗
Skip to content

Add vectorcall (PEP 590) dispatch for function calls#7329

Merged
youknowone merged 4 commits into
RustPython:mainfrom
youknowone:skip-class
Mar 3, 2026
Merged

Add vectorcall (PEP 590) dispatch for function calls#7329
youknowone merged 4 commits into
RustPython:mainfrom
youknowone:skip-class

Conversation

@youknowone

@youknowone youknowone commented Mar 3, 2026

Copy link
Copy Markdown
Member

Add VectorCallFunc slot to PyTypeSlots and vectorcall dispatch path in the interpreter loop for Call and CallKw instructions.

Implement vectorcall for PyFunction (with fast path for simple positional-only calls that fills fastlocals directly), PyBoundMethod (avoids prepend_arg O(n) shift), and PyNativeFunction.

Add FuncArgs::from_vectorcall helper for fallback conversion. Vectorcall slot is inherited with call slot and cleared when call is overridden in Python subclasses.

Summary by CodeRabbit

Release Notes

  • Refactor
    • Implemented vectorcall (PEP 590) support throughout the VM for more efficient function call handling, reducing memory allocations and improving function call performance across builtin functions, regular functions, bound methods, and frame execution.

Add VectorCallFunc slot to PyTypeSlots and vectorcall dispatch path
in the interpreter loop for Call and CallKw instructions.

Implement vectorcall for PyFunction (with fast path for simple
positional-only calls that fills fastlocals directly), PyBoundMethod
(avoids prepend_arg O(n) shift), and PyNativeFunction.

Add FuncArgs::from_vectorcall helper for fallback conversion.
Vectorcall slot is inherited with call slot and cleared when
__call__ is overridden in Python subclasses.
@coderabbitai

coderabbitai Bot commented Mar 3, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR implements PEP 590 vectorcall support across the VM to enable fast-path calling for builtin functions, Python functions, and bound methods. It adds vectorcall slots to the type system, modifies frame dispatch to use vectorcall when available with fallback to existing paths, and enables owned-argument optimization to reduce allocations.

Changes

Cohort / File(s) Summary
Type System & Slot Infrastructure
crates/vm/src/types/slot.rs, crates/vm/src/types/slot_defs.rs
Introduces VectorCallFunc type alias and vectorcall field in PyTypeSlots. Extends slot inheritance and copying logic to propagate vectorcall alongside TpCall in MRO and base-slot paths.
Function Argument Handling
crates/vm/src/function/argument.rs
Adds FuncArgs::from_vectorcall constructor to build FuncArgs from vectorcall-style arguments (positional + kwnames pairs).
Builtin Function Vectorcall
crates/vm/src/builtins/builtin_func.rs
Implements vectorcall_native_function for builtin functions with self-handling logic, avoiding O(n) prepend; registers vectorcall on builtin_function_or_method_type slots.
Python Function & Bound Method Vectorcall
crates/vm/src/builtins/function.rs
Refactors invoke_exact_args to accept owned Vec<PyObjectRef>; adds vectorcall_function for Python functions and vectorcall_bound_method for bound methods; registers both in type initialization.
Callable Protocol
crates/vm/src/protocol/callable.rs
Extends PyCallable with vectorcall field and new invoke_vectorcall method; dispatches through vectorcall slot when available with fallback to FuncArgs-based invocation; supports tracing integration.
Frame Execution & Dispatch
crates/vm/src/frame.rs
Adds execute_call_vectorcall and execute_call_kw_vectorcall helpers to dispatch through vectorcall slots in Call and CallKw instruction handling; updates specialized call paths to use vectorcall when available.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh

Poem

🐰 A vectorcall swift,
No clones, no prepends in sight,
Fast args take flight!
Slots aligned with grace,
PEP 590 sets the pace. ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add vectorcall (PEP 590) dispatch for function calls' directly and accurately reflects the main objective of this pull request, which is to implement vectorcall support across the interpreter.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Mar 3, 2026

Copy link
Copy Markdown
Contributor

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin skip-class

…ecialized paths

- invoke_exact_args takes Vec by value and uses drain() to move args
  into fastlocals instead of cloning (eliminates refcount overhead)
- CallPyGeneral and CallBoundMethodGeneral now call vectorcall_function
  directly instead of going through FuncArgs + prepend_arg + invoke
- CallKwPy and CallKwBoundMethod use vectorcall_function with kwnames
- vectorcall_bound_method uses insert(0) on existing Vec instead of
  allocating a second Vec
github-actions Bot and others added 2 commits March 3, 2026 12:35
When needs_self was true and kwargs were present, pos_args only
contained positional args (self + original positionals) but
from_vectorcall expected kwarg values to follow in the slice.

Build the full args array (self + all original args including kwarg
values) before passing to from_vectorcall.
@ShaharNaveh

Copy link
Copy Markdown
Contributor

Super excited about this one ngl

@youknowone youknowone marked this pull request as ready for review March 3, 2026 14:33

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 2

🧹 Nitpick comments (1)
crates/vm/src/frame.rs (1)

5695-5816: Consider extracting shared vectorcall argument marshalling into one helper.

execute_call_vectorcall and execute_call_kw_vectorcall duplicate stack-index math and vector construction. A small internal helper for shared stack slicing/consumption would reduce drift risk and simplify future fixes.

As per coding guidelines "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 5695 - 5816, Duplicate stack-index math
and args construction in execute_call_vectorcall and execute_call_kw_vectorcall;
extract a small helper to centralize that logic. Create an internal method
(e.g., collect_vectorcall_args) used by both execute_call_vectorcall and
execute_call_kw_vectorcall that: computes
callable_idx/self_or_null_idx/args_start from nargs and optional kw_count,
consumes the stack values (including self_or_null if present and kw values) into
a Vec<PyObjectRef>, returns (args_vec, effective_nargs, optional_kwnames) and
leaves the stack truncated at callable_idx; then call
callable_obj.vectorcall(args_vec, effective_nargs, kwnames_opt, vm) and
push_value(result). Reuse existing helpers like
collect_positional_args/pop_multiple/pop_value_opt where appropriate and update
both functions to call this new helper and remove the duplicated index math and
loop logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 1280-1289: The vectorcall fast path currently eagerly allocates a
new dict for NEWLOCALS (via vm.ctx.new_dict() and ArgMapping::from_dict_exact)
before creating the Frame; change this to use a None-local mapping so Scope::new
receives None when code.flags contains bytecode::CodeFlags::NEWLOCALS, matching
invoke_with_locals/invoke_exact_args behavior and allowing Frame initialization
to use FrameLocals::lazy() (defer allocation). Update the locals binding used by
Frame::new (the locals variable) to be an Option that is None for NEWLOCALS
instead of constructing vm.ctx.new_dict().

In `@crates/vm/src/function/argument.rs`:
- Around line 146-151: In from_vectorcall validate the vectorcall layout before
any unchecked slicing: ensure nargs <= args.len() and if kwnames.is_some() then
nargs + kwnames.len() <= args.len(), returning a Python TypeError/ValueError
(Err) instead of panicking; replace the .expect() downcast on kwnames entries
with safe handling (use .get() or iterate with .zip and match on PyString
downcast) and return an Err TypeError when a keyword name is not a string;
update construction of pos_args and kwargs to use checked slices (or .get()
results) so no unchecked indexing or .expect() can panic the VM.

---

Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Around line 5695-5816: Duplicate stack-index math and args construction in
execute_call_vectorcall and execute_call_kw_vectorcall; extract a small helper
to centralize that logic. Create an internal method (e.g.,
collect_vectorcall_args) used by both execute_call_vectorcall and
execute_call_kw_vectorcall that: computes
callable_idx/self_or_null_idx/args_start from nargs and optional kw_count,
consumes the stack values (including self_or_null if present and kw values) into
a Vec<PyObjectRef>, returns (args_vec, effective_nargs, optional_kwnames) and
leaves the stack truncated at callable_idx; then call
callable_obj.vectorcall(args_vec, effective_nargs, kwnames_opt, vm) and
push_value(result). Reuse existing helpers like
collect_positional_args/pop_multiple/pop_value_opt where appropriate and update
both functions to call this new helper and remove the duplicated index math and
loop logic.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 257b0c0 and 4d8cf97.

📒 Files selected for processing (7)
  • crates/vm/src/builtins/builtin_func.rs
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/function/argument.rs
  • crates/vm/src/protocol/callable.rs
  • crates/vm/src/types/slot.rs
  • crates/vm/src/types/slot_defs.rs

Hide details View details @youknowone youknowone merged commit be0c3ca into RustPython:main Mar 3, 2026
13 checks passed
@youknowone youknowone deleted the skip-class branch March 3, 2026 14:51
youknowone added a commit to youknowone/RustPython that referenced this pull request Mar 22, 2026
* Add vectorcall (PEP 590) dispatch for function calls

Add VectorCallFunc slot to PyTypeSlots and vectorcall dispatch path
in the interpreter loop for Call and CallKw instructions.

Implement vectorcall for PyFunction (with fast path for simple
positional-only calls that fills fastlocals directly), PyBoundMethod
(avoids prepend_arg O(n) shift), and PyNativeFunction.

Add FuncArgs::from_vectorcall helper for fallback conversion.
Vectorcall slot is inherited with call slot and cleared when
__call__ is overridden in Python subclasses.

* Optimize vectorcall: move args instead of clone, use vectorcall in specialized paths

- invoke_exact_args takes Vec by value and uses drain() to move args
  into fastlocals instead of cloning (eliminates refcount overhead)
- CallPyGeneral and CallBoundMethodGeneral now call vectorcall_function
  directly instead of going through FuncArgs + prepend_arg + invoke
- CallKwPy and CallKwBoundMethod use vectorcall_function with kwnames
- vectorcall_bound_method uses insert(0) on existing Vec instead of
  allocating a second Vec

* Auto-format: cargo fmt --all

* Fix vectorcall_native_function kwarg slice out-of-bounds

When needs_self was true and kwargs were present, pos_args only
contained positional args (self + original positionals) but
from_vectorcall expected kwarg values to follow in the slice.

Build the full args array (self + all original args including kwarg
values) before passing to from_vectorcall.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants