gh-106581: Split CALL_PY_EXACT_ARGS into uops#107760
Conversation
brandtbucher
left a comment
There was a problem hiding this comment.
Thanks for tackling this, it definitely doesn't look easy. It's sort of a bummer that we need to special-case this much stuff, but I also don't see a nicer way of handling these issues than what you have here.
A few comments and questions, mostly for my own understanding:
Sorry, something went wrong.
gvanrossum
left a comment
There was a problem hiding this comment.
I'll add that assert; then I'll review your PR, and hopefully you can then merge that, and I can handle the merge fallout.
Sorry, something went wrong.
|
Made this back into a draft; I need to (a) wait for Brandt's gh-107788, then (b) redo the split and tooling changes using Mark's ideas. |
Sorry, something went wrong.
|
The |
Sorry, something went wrong.
This is only the first step for doing `CALL` in Tier 2. The next step involves tracing into the called code object. After that we'll have to do the remaining `CALL` specialization. Finally we'll have to tweak various things like `KW_NAMES`, and possibly move the `NULL` (for method calls) *above* the callable. But those are things for future PRs. Note: this moves setting `frame->return_offset` directly in front of `DISPATCH_INLINED()`, to make it easier to move it into `_PUSH_FRAME`.
|
Closing and re-opening to retrigger CLA checks. Sorry for the noise. |
Sorry, something went wrong.
Instead, the special case is an opcode using SAVE_FRAME_STATE(). Introducing #if TIER_ONE and #if TIER_TWO so we can implement _PUSH_FRAME differently for both tiers.
Instead, we special-case SAVE_IP: - Its Tier 2 expansion sets oparg to the instruction offset - In Tier 1 it is a no-op (and skipped if present in a macro)
|
@markshannon I was hoping you'd review this. I added Unless you'd rather review #107925, which includes this (and #107793, which is the intermediate stage). |
Sorry, something went wrong.
markshannon
left a comment
There was a problem hiding this comment.
I'm uneasy about the introduction of the TIER_ONE and TIER_TWO macros.
It is a principle of the overall design that there is a single source of truth for the semantics of bytecodes.
It might appear that I'm being dogmatic, but the need for something like those macros often indicates an underlying problem that should be fixed independently.
In this case the problem is the cframe. Loading and saving the IP needs to handled specially anyway and saving and loading the SP should be the same for both interpreters (but will need to be handled specially by the copy-and-patch compiler, so should be its own micro-op).
It is pushes the frame that differs. Removing cframe will fix that.
The cframe only exists as a performance hack to minimize the impact of tracing prior to PEP 669.
Sorry, something went wrong.
|
Benchmark: 1.00x faster: https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20230816-3.13.0a0-05af848 IOW it doesn't slow CALL_PY_EXACT_ARGS down, which is all I care about. |
Sorry, something went wrong.
⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️Hi! The buildbot wasm32-emscripten node (pthreads) 3.x has failed when building commit dc8fdf5. What do you need to do:
You can take a look at the buildbot page here: https://buildbot.python.org/all/#builders/1050/builds/2796 Summary of the results of the build (if available): == Tests result: ENV CHANGED == 329 tests OK. 10 slowest tests:
1 test altered the execution environment: 117 tests skipped: Total duration: 26 min 4 sec Click to see traceback logsTraceback (most recent call last):
File "/opt/buildbot/bcannon-wasm/3.x.bcannon-wasm.emscripten-node-pthreads/build/Lib/test/test_capi/test_watchers.py", line 532, in watcher
raise MyError("testing 123")
|
Sorry, something went wrong.
This finishes the work begun in gh-107760. When, while projecting a superblock, we encounter a call to a short, simple function, the superblock will now enter the function using `_PUSH_FRAME`, continue through it, and leave it using `_POP_FRAME`, and then continue through the original code. Multiple frame pushes and pops are even possible. It is also possible to stop appending to the superblock in the middle of a called function, when running out of space or encountering an unsupported bytecode.
This is only the first step for doing
CALLin Tier 2. The next step involves tracing into the called code object. After that we'll have to do the remainingCALLspecialization. Finally we'll have to tweak various things likeKW_NAMES, and possibly move theNULL(for method calls) above the callable (that's 107788). But those are things for future PRs.Note: this moves setting
frame->return_offsetdirectly in front ofDISPATCH_INLINED(), to make it easier to move it into_PUSH_FRAME.CALL's stack #107788_Py_EnterRecursivePy