gh-87613: Argument Clinic vectorcall decorator#145381
Conversation
Add `@vectorcall` as a decorator to Argument Clinic (AC) which generates a new [Vectorcall Protocol](https://docs.python.org/3/c-api/call.html#the-vectorcall-protocol) argument parsing C function named `{}_vectorcall`. This is only supported for `__new__` and `__init__` currently to simplify implementation. The generated code has similar or better performance to existing hand-written cases for `list`, `float`, `str`, `tuple`, `enumerate`, `reversed`, and `int`. Using the decorator added vectorcall to `bytearray` and construction got 1.09x faster. For more details see the comments in pythongh-87613. The `@vectorcall` decorator has two options: - **zero_arg={C_FUNC}**: Some types, like `int`, can be called with zero arguments and return an immortal object in that case. Adding a shortcut is needed to match existing hand-written performance; provides an over 10% performance change for those cases. - **exact_only**: If the type is not an exact match delegate to the existing non-vectorcall implementation. NEeded for `str` to get matching performance while ensuring correct behavior. Implementation details: - Adds support for the new decorator with arguments in the AC DSL Parser - Move keyword argument parsing generation from inline to a function so both vectorcall, `vc_`, and existing can share code generation. - Adds an `emit` helper to simplify code a bit from existing AC cases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
corona10
left a comment
There was a problem hiding this comment.
Could you replace current hand-written with your new DSL.
Let's see how handle them.
Sorry, something went wrong.
|
I have commits to do that in my draft branch (https://github.com/python/cpython/compare/main...cmaloney:cpython:ac_vectorcall_v1?expand=0); can pull them into this branch if that would be easier / better to review. This generally produces code that is as fast or faster than the hand-written ones currently (full benchmarking in: #87613 (comment)) |
Sorry, something went wrong.
|
Added commits moving |
Sorry, something went wrong.
|
I will take a look at this PR til end of this week. |
Sorry, something went wrong.
I would like to suggest you at least to think about complexobject.c. Based on benchmarks for the float pr (#22432) I would expect a good performance boost (maybe not 1.5x, but more than from freelist addition). Yes, this case seems to be already covered by the enum.c example (kwargs). On another hand, the complex class has special hacks to support multiple signatures (
Still, there are some regressions, e.g. int(str). Could you explain this difference? I also suggest you to try pyperformance on this. |
Sorry, something went wrong.
Will implemnt it in my draft branch this week. As part of developing this PR I added Vectorcall Protocol support to
The Lines 6539 to 6559 in c9a5d9a That switch specializes 1-argument to call I'm comparing to the handwritten because I want
Will run on this PR as it exists currently. I can also run on my draft branch but not sure that will give a clear signal as it migrates every hand-written vectorcall even if it makes them slower. Ideally to me would be able to figure out what types are commonly constructed in pyperformance benchmarks so I can make a draft branch adding |
Sorry, something went wrong.
|
Did you consider adding this implicitly if supported, instead of making it opt-in? Disclaimer: I didn't take a look at the implementation yet. |
Sorry, something went wrong.
Yes, I hope that can mitigate speed regression with Decimal's since v3.13. Edit: no, this doesn't help too much.
In any case, this will happen on case-by-case basis (this pr should include minimal set of such examples). BTW, here my results for this pr + c14c173. Benchmark code: cmaloney@ddcd3b6, run with --rigorous
Benchmark hidden because not significant (9): list(), list_subclass, float(), float(str), bytearray(), tuple(list), int(), reversed(list), enumerate(list) With default ./configure:
BTW, I wonder how noisy your benchmarks, here an alternative approach with bench_func().# vectorcall-bench.py
import pyperf
runner = pyperf.Runner()
bench_cases = ['1<<7', '1<<38', '1<<300', '1<<3000']
for c in bench_cases: # XXX: bigger sample
i = eval(c)
bn = f'int({c})'
runner.bench_func(bn, int, i)
for c in bench_cases:
i = eval(c)
s = str(i)
bn = f'int({c!r})'
runner.bench_func(bn, int, s)As before, all optimizations:
Default:
I agreed. But if auto-generated code catch major patterns for current hand-written functions - it will be great.
No, I don't think it does make much sense with a lot of conversions to AC magic in one shot. |
Sorry, something went wrong.
My leaning is explicit at least to start. I think that provides a good path for gradual adoption / testing / rollout (hopefully in the 3.15 timeframe). I'd really like at least an alpha which reaches wider testing with a couple common types (ex. bytes) moved to make sure there aren't unanticipated tradeoffs or issues. |
Sorry, something went wrong.
vstinner
left a comment
There was a problem hiding this comment.
Would it be possible to add a @vectorcall test to Modules/_testclinic.c?
Sorry, something went wrong.
|
This PR is very promising! Great work. |
Sorry, something went wrong.
|
Added a vectorcall test to the Debating if a hypothesis test of "does this parse the right args + kwargs" would provide value for the complexity. |
Sorry, something went wrong.
|
Full pyperformance numbers on the current PR below. No cases stand out to me / all seems to be within noise for my machine. That is Not sure what will make this easier to review. I am happy to resolve the merge conflict anytime but also don't want to disturb in-process reviews. I am thinking of scoping this down to just pyperformance comparison: 3484ef6 (just before) vs HEAD (vectorcall clinic)Platform: Linux-6.19.9-arch1-1-x86_64-with-glibc2.43 | 32 logical CPUs
|
Sorry, something went wrong.
|
This PR is stale because it has been open for 30 days with no activity. |
Sorry, something went wrong.
Conflicts resolved: - Objects/enumobject.c: kept AC-generated vectorcall (discarded upstream's manual enumerate_vectorcall, superseded by @vectorcall decorator) - Modules/clinic/_testclinic.c.h: kept HEAD's AC-generated test vectorcall code Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
eendebakpt
left a comment
There was a problem hiding this comment.
Three minor nits, but this looks like a nice improvement.
Sorry, something went wrong.
…age of generated cases
Add
@vectorcallas a decorator to Argument Clinic (AC) which emits a Vectorcall Protocol argument parsing C function named{type}_vectorcall. This is only supported for__new__and__init__currently to simplify implementation.The generated code has similar or better performance to existing hand-written cases for
list,float,str,tuple,enumerate,reversed, andint. Using the decorator onbytearray, which has no handwritten case, construction got 1.09x faster. For more benchmark details see #87613 (comment).The
@vectorcalldecorator has two options:zero_arg={C_FUNC}: Some types, likeint, can be called with zero arguments and return an immortal object in that case. Adding a shortcut is needed to match existing hand-written performance; provides an over 10% performance change for those cases.exact_only: If the type is not an exact match delegate to the existing non-vectorcall implementation. Needed forstrto get matching performance while ensuring correct behavior.Implementation details:
vc_, and existing can share code generation.emithelper to simplify code a bit from existing AC casesCo-Authored-By: Claude Opus 4.6 noreply@anthropic.com