gh-111520: Integrate the Tier 2 interpreter in the Tier 1 interpreter#111428
gh-111520: Integrate the Tier 2 interpreter in the Tier 1 interpreter#111428gvanrossum merged 27 commits into
Conversation
Use `GOTO_ERROR(error)` instead of `goto error`. This macro is defined differently in the tier two interpreter.
This replaces PYTHONUOPSDEBUG=N. The meaning of N is the same (for now): 0: no tracing 1: print when tier 2 trace created 2: print contents of tier 2 trace 3: print every uop executed 4: print optimization attempts and details 5: print tier 1 instructions and stack
|
With uops always enabled, only test_embed fails, but that's the same on main (see GH-111339). I added that to benchmark Tier 2 perf (who knows, the speedier Tier switching may make things faster there). |
Sorry, something went wrong.
|
@markshannon Can you review this? If you think this is roughly going in the right direction I will clean it up. |
Sorry, something went wrong.
markshannon
left a comment
There was a problem hiding this comment.
Looks promising.
Sorry, something went wrong.
- Remove CHECK_EVAL_BREAKER() from top of Tier 2 loop - Make Tier 2 default case Py_UNREACHABLE() in non-debug mode - GOTO_TIER_TWO() macro instead of `goto enter_tier_two` - Inline ip_offset (Tier 2 LOAD_IP() is now a no-op) - Move #define/#under TIER_ONE/TIER_TWO into generated .c.h files - ceval_macros.h defines Tier 1 macros, Tier 2 is inlined in ceval.c
gvanrossum
left a comment
There was a problem hiding this comment.
I consider this sufficiently cleaned up to undraft it, and if @markshannon doesn't comment I'll merge tomorrow.
I could noodle endlessly with the code for dropping from tier 2 back into tier 1, with and without errors; but it may be better to put that off until a more careful review.
Sorry, something went wrong.
|
Whoops, back to draft, I accidentally left in the uops-forever cherrypick. Will fix later. |
Sorry, something went wrong.
(I accidentally kept this commit after pushing it experimentally. This means I've been testing with uops on all the time, which is actually pretty amazing.) This reverts commit e0e60ce.
Using `GOTO_TIER_ONE()` macro. This should make things simpler for Justin.
The tests run against debug builds, which will have very different stack usage patterns compared to optimised builds. I wouldn't worry too much about changing anything other than the default recursion limit (which is different for debug vs. release), and then do a buildbot run to check the optimised builds. Otherwise, the best way to reduce stack usage will be to reduce the size of local variables, possibly by refactoring into separate functions so that the variables are only allocated temporarily and don't remain on the stack as the Python code recurses. |
Sorry, something went wrong.
But the whole exercise is to merge two functions in one because we want to be able to transfer back and forth using I will see by how much the debug recursion limit needs to be adjusted to make the tests pass; I think I have only two extra local variables left. |
Sorry, something went wrong.
|
🤖 New build scheduled with the buildbot fleet by @gvanrossum for commit fdf1a2f 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
Sorry, something went wrong.
|
If this now passes the tests and the buildbots don't freak out I am going to merge this. However, @markshannon, could you have a look at how I fixed |
Sorry, something went wrong.
|
Another fix would be to increase the value on this line: cpython/PCbuild/python.vcxproj Line 98 in 5d6db16 That's what determines how much stack is available, and running out is what causes the hard crash. As you can see, it's already 4x larger for debug builds than releases, but it's still only 8MB and so making it larger isn't really going to hurt anyone much. (I'll note that I haven't looked through the whole PR, so I don't know exactly what is causing it or whether the existing recursion limit comes into play at all. I'm just throwing out helpful pointers and trusting that you guys know what's actually going on :) ) |
Sorry, something went wrong.
Ah, thanks. That makes more sense. I don't know exactly what's going on either, but I know that I've added some local variables to the function containing the main interpreter loop, and that causes hard crashes in a bunch of tests that recurse deeply (at the C level). I'm now applying your fix instead of my previous hacks -- except I'm keeping the bit in I've just pushed that and if tests still pass I'll merge. |
Sorry, something went wrong.
…preter (python#111428) - There is no longer a separate Python/executor.c file. - Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`, you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`). - The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files. - In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`. - On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB. - **Beware!** This changes the env vars to enable uops and their debugging to `PYTHON_UOPS` and `PYTHON_LLTRACE`.
…preter (python#111428) - There is no longer a separate Python/executor.c file. - Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`, you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`). - The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files. - In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`. - On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB. - **Beware!** This changes the env vars to enable uops and their debugging to `PYTHON_UOPS` and `PYTHON_LLTRACE`.
…preter (python#111428) - There is no longer a separate Python/executor.c file. - Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`, you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`). - The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files. - In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`. - On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB. - **Beware!** This changes the env vars to enable uops and their debugging to `PYTHON_UOPS` and `PYTHON_LLTRACE`.
See also faster-cpython/ideas#631
Beware! This changes the env vars to enable uops and their debugging to
PYTHON_UOPSandPYTHON_LLTRACE.