◐ Shell
reader mode source ↗
Skip to content

gh-111520: Integrate the Tier 2 interpreter in the Tier 1 interpreter#111428

Merged
gvanrossum merged 27 commits into
python:mainfrom
gvanrossum:mix-tiers
Nov 1, 2023
Merged

gh-111520: Integrate the Tier 2 interpreter in the Tier 1 interpreter#111428
gvanrossum merged 27 commits into
python:mainfrom
gvanrossum:mix-tiers

Conversation

@gvanrossum

@gvanrossum gvanrossum commented Oct 28, 2023

Copy link
Copy Markdown
Member

See also faster-cpython/ideas#631

Beware! This changes the env vars to enable uops and their debugging to PYTHON_UOPS and PYTHON_LLTRACE.

Use `GOTO_ERROR(error)` instead of `goto error`.
This macro is defined differently in the tier two interpreter.
This replaces PYTHONUOPSDEBUG=N. The meaning of N is the same (for now):

  0: no tracing
  1: print when tier 2 trace created
  2: print contents of tier 2 trace
  3: print every uop executed
  4: print optimization attempts and details
  5: print tier 1 instructions and stack
@gvanrossum

Copy link
Copy Markdown
Member Author

With uops always enabled, only test_embed fails, but that's the same on main (see GH-111339). I added that to benchmark Tier 2 perf (who knows, the speedier Tier switching may make things faster there).

@gvanrossum

Copy link
Copy Markdown
Member Author

@markshannon Can you review this? If you think this is roughly going in the right direction I will clean it up.

@markshannon markshannon left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Looks promising.

@gvanrossum gvanrossum changed the title Integrate the Tier 2 interpreter in the Tier 1 interpreter Oct 30, 2023
- Remove CHECK_EVAL_BREAKER() from top of Tier 2 loop
- Make Tier 2 default case Py_UNREACHABLE() in non-debug mode
- GOTO_TIER_TWO() macro instead of `goto enter_tier_two`
- Inline ip_offset (Tier 2 LOAD_IP() is now a no-op)
- Move #define/#under TIER_ONE/TIER_TWO into generated .c.h files
- ceval_macros.h defines Tier 1 macros, Tier 2 is inlined in ceval.c

@gvanrossum gvanrossum left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

I consider this sufficiently cleaned up to undraft it, and if @markshannon doesn't comment I'll merge tomorrow.

I could noodle endlessly with the code for dropping from tier 2 back into tier 1, with and without errors; but it may be better to put that off until a more careful review.

@gvanrossum gvanrossum marked this pull request as ready for review November 1, 2023 00:23
@gvanrossum gvanrossum requested a review from a team as a code owner November 1, 2023 00:23
@gvanrossum gvanrossum marked this pull request as draft November 1, 2023 00:42
@gvanrossum

Copy link
Copy Markdown
Member Author

Whoops, back to draft, I accidentally left in the uops-forever cherrypick. Will fix later.

(I accidentally kept this commit after pushing it experimentally.
This means I've been testing with uops on all the time, which is
actually pretty amazing.)

This reverts commit e0e60ce.
@gvanrossum gvanrossum marked this pull request as ready for review November 1, 2023 04:35
8 hidden items Load more…
@zooba

zooba commented Nov 1, 2023

Copy link
Copy Markdown
Member

Alas, we still run out of stack space on the recursion tests on Windows. I'll keep working on this.

The tests run against debug builds, which will have very different stack usage patterns compared to optimised builds. I wouldn't worry too much about changing anything other than the default recursion limit (which is different for debug vs. release), and then do a buildbot run to check the optimised builds.

Otherwise, the best way to reduce stack usage will be to reduce the size of local variables, possibly by refactoring into separate functions so that the variables are only allocated temporarily and don't remain on the stack as the Python code recurses.

@gvanrossum

gvanrossum commented Nov 1, 2023

Copy link
Copy Markdown
Member Author

Otherwise, the best way to reduce stack usage will be to reduce the size of local variables, possibly by refactoring into separate functions so that the variables are only allocated temporarily and don't remain on the stack as the Python code recurses.

But the whole exercise is to merge two functions in one because we want to be able to transfer back and forth using goto rather than calls!

I will see by how much the debug recursion limit needs to be adjusted to make the tests pass; I think I have only two extra local variables left.

@gvanrossum gvanrossum added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 1, 2023
@bedevere-bot

Copy link
Copy Markdown

🤖 New build scheduled with the buildbot fleet by @gvanrossum for commit fdf1a2f 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 1, 2023
@AlexWaygood AlexWaygood removed their request for review November 1, 2023 18:30
@gvanrossum

Copy link
Copy Markdown
Member Author

If this now passes the tests and the buildbots don't freak out I am going to merge this. However, @markshannon, could you have a look at how I fixed test_call? This feels like a band-aid.

@zooba

zooba commented Nov 1, 2023

Copy link
Copy Markdown
Member

Another fix would be to increase the value on this line:

<StackReserveSize Condition="$(Configuration) == 'Debug'">8000000</StackReserveSize>

That's what determines how much stack is available, and running out is what causes the hard crash. As you can see, it's already 4x larger for debug builds than releases, but it's still only 8MB and so making it larger isn't really going to hurt anyone much.

(I'll note that I haven't looked through the whole PR, so I don't know exactly what is causing it or whether the existing recursion limit comes into play at all. I'm just throwing out helpful pointers and trusting that you guys know what's actually going on :) )

@gvanrossum

Copy link
Copy Markdown
Member Author

Another fix would be to increase the value on this line:

Ah, thanks. That makes more sense. I don't know exactly what's going on either, but I know that I've added some local variables to the function containing the main interpreter loop, and that causes hard crashes in a bunch of tests that recurse deeply (at the C level). I'm now applying your fix instead of my previous hacks -- except I'm keeping the bit in test.support that dynamically pulls the value of Py_C_RECURSION_LIMIT1 from _testcapi (if it exists) rather than hardcoding a value that must be kept in sync with something defined in pystate.h. (It's technically still the case, but only if _testcapi can't be imported.)

I've just pushed that and if tests still pass I'll merge.

@gvanrossum gvanrossum merged commit 7e135a4 into python:main Nov 1, 2023
@gvanrossum gvanrossum deleted the mix-tiers branch November 1, 2023 20:13
FullteaR pushed a commit to FullteaR/cpython that referenced this pull request Nov 3, 2023
…preter (python#111428)

- There is no longer a separate Python/executor.c file.
- Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`,
  you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`).
- The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files.
- In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`.
- On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB.
- **Beware!** This changes the env vars to enable uops and their debugging
  to `PYTHON_UOPS` and `PYTHON_LLTRACE`.
aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
…preter (python#111428)

- There is no longer a separate Python/executor.c file.
- Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`,
  you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`).
- The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files.
- In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`.
- On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB.
- **Beware!** This changes the env vars to enable uops and their debugging
  to `PYTHON_UOPS` and `PYTHON_LLTRACE`.
Glyphack pushed a commit to Glyphack/cpython that referenced this pull request Sep 2, 2024
…preter (python#111428)

- There is no longer a separate Python/executor.c file.
- Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`,
  you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`).
- The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files.
- In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`.
- On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB.
- **Beware!** This changes the env vars to enable uops and their debugging
  to `PYTHON_UOPS` and `PYTHON_LLTRACE`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants