◐ Shell
reader mode source ↗
Skip to content

gh-104584: Support most jumping instructions#106393

Closed
gvanrossum wants to merge 9 commits into
python:mainfrom
gvanrossum:jump-uops
Closed

gh-104584: Support most jumping instructions#106393
gvanrossum wants to merge 9 commits into
python:mainfrom
gvanrossum:jump-uops

Conversation

@gvanrossum

@gvanrossum gvanrossum commented Jul 4, 2023

Copy link
Copy Markdown
Member
  • Supports all jumping opcodes except SEND
  • Includes FOR_ITER and its specializations, except FOR_ITER_GEN
  • "See through" ENTER_EXECUTOR
  • Change POP_JUMP_IF_TRUE/FALSE to only jump when needed
  • JUMP_BACKWARD to top of trace becomes a special JUMP_TO_TOP uop

A big weakness (?) is that this always assumes that branches are rarely taken. Any time a branch or jump (other than to the top of the trace) is taken, we leave the trace.

@gvanrossum

Copy link
Copy Markdown
Member Author

FWIW the Tier 2 interpreter now supports 109 bytecodes and 12 uops (the Tier 1 interpreter supports 202 bytecodes).

Details

Cases in Python/executor_cases.c.h:

        case NOP: {
        case LOAD_FAST_CHECK: {
        case LOAD_FAST: {
        case LOAD_FAST_AND_CLEAR: {
        case LOAD_CONST: {
        case STORE_FAST: {
        case POP_TOP: {
        case PUSH_NULL: {
        case END_SEND: {
        case UNARY_NEGATIVE: {
        case UNARY_NOT: {
        case TO_BOOL: {
        case TO_BOOL_BOOL: {
        case TO_BOOL_INT: {
        case TO_BOOL_LIST: {
        case TO_BOOL_NONE: {
        case TO_BOOL_STR: {
        case TO_BOOL_ALWAYS_TRUE: {
        case UNARY_INVERT: {
        case _GUARD_BOTH_INT: {
        case _BINARY_OP_MULTIPLY_INT: {
        case _BINARY_OP_ADD_INT: {
        case _BINARY_OP_SUBTRACT_INT: {
        case _GUARD_BOTH_FLOAT: {
        case _BINARY_OP_MULTIPLY_FLOAT: {
        case _BINARY_OP_ADD_FLOAT: {
        case _BINARY_OP_SUBTRACT_FLOAT: {
        case _GUARD_BOTH_UNICODE: {
        case _BINARY_OP_ADD_UNICODE: {
        case BINARY_SUBSCR: {
        case BINARY_SLICE: {
        case STORE_SLICE: {
        case BINARY_SUBSCR_LIST_INT: {
        case BINARY_SUBSCR_TUPLE_INT: {
        case BINARY_SUBSCR_DICT: {
        case LIST_APPEND: {
        case SET_ADD: {
        case STORE_SUBSCR: {
        case STORE_SUBSCR_LIST_INT: {
        case STORE_SUBSCR_DICT: {
        case DELETE_SUBSCR: {
        case CALL_INTRINSIC_1: {
        case CALL_INTRINSIC_2: {
        case GET_AITER: {
        case GET_ANEXT: {
        case GET_AWAITABLE: {
        case POP_EXCEPT: {
        case LOAD_ASSERTION_ERROR: {
        case LOAD_BUILD_CLASS: {
        case STORE_NAME: {
        case DELETE_NAME: {
        case UNPACK_SEQUENCE: {
        case UNPACK_SEQUENCE_TWO_TUPLE: {
        case UNPACK_SEQUENCE_TUPLE: {
        case UNPACK_SEQUENCE_LIST: {
        case UNPACK_EX: {
        case DELETE_ATTR: {
        case STORE_GLOBAL: {
        case DELETE_GLOBAL: {
        case _LOAD_LOCALS: {
        case _LOAD_FROM_DICT_OR_GLOBALS: {
        case LOAD_GLOBAL: {
        case DELETE_FAST: {
        case DELETE_DEREF: {
        case LOAD_FROM_DICT_OR_DEREF: {
        case LOAD_DEREF: {
        case STORE_DEREF: {
        case COPY_FREE_VARS: {
        case BUILD_STRING: {
        case BUILD_TUPLE: {
        case BUILD_LIST: {
        case LIST_EXTEND: {
        case SET_UPDATE: {
        case BUILD_SET: {
        case BUILD_MAP: {
        case SETUP_ANNOTATIONS: {
        case BUILD_CONST_KEY_MAP: {
        case DICT_UPDATE: {
        case DICT_MERGE: {
        case MAP_ADD: {
        case LOAD_SUPER_ATTR_ATTR: {
        case LOAD_SUPER_ATTR_METHOD: {
        case LOAD_ATTR: {
        case COMPARE_OP: {
        case COMPARE_OP_FLOAT: {
        case COMPARE_OP_INT: {
        case COMPARE_OP_STR: {
        case IS_OP: {
        case CONTAINS_OP: {
        case CHECK_EG_MATCH: {
        case CHECK_EXC_MATCH: {
        case JUMP_FORWARD: {
        case JUMP_BACKWARD: {
        case POP_JUMP_IF_FALSE: {
        case POP_JUMP_IF_TRUE: {
        case POP_JUMP_IF_NOT_NONE: {
        case POP_JUMP_IF_NONE: {
        case JUMP_BACKWARD_NO_INTERRUPT: {
        case GET_LEN: {
        case MATCH_CLASS: {
        case MATCH_MAPPING: {
        case MATCH_SEQUENCE: {
        case MATCH_KEYS: {
        case GET_ITER: {
        case GET_YIELD_FROM_ITER: {
        case FOR_ITER: {
        case FOR_ITER_LIST: {
        case FOR_ITER_TUPLE: {
        case FOR_ITER_RANGE: {
        case WITH_EXCEPT_START: {
        case PUSH_EXC_INFO: {
        case EXIT_INIT_CHECK: {
        case MAKE_FUNCTION: {
        case SET_FUNCTION_ATTRIBUTE: {
        case BUILD_SLICE: {
        case CONVERT_VALUE: {
        case FORMAT_SIMPLE: {
        case FORMAT_WITH_SPEC: {
        case COPY: {
        case BINARY_OP: {
        case SWAP: {

@gvanrossum

Copy link
Copy Markdown
Member Author

FWIW the Tier 2 interpreter now supports 109 bytecodes and 12 uops (the Tier 1 interpreter supports 202 bytecodes).

It looks like if we can manage to arrange that a KW_NAMES instruction and the following CALL instruction or CALL specialization are always either part of the same trace or neither included in a trace, we could add over a dozen new bytecodes to the list (in particular, many of the simpler CALL specializations).

@markshannon markshannon left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

We want to support specialized instructions, like LOAD_ATTR_INSTANCE_VALUE, and simple instructions like LOAD_FAST, but we specifically do not want to support unspecialized instructions like LOAD_ATTR.

If those instructions show up, either we are projecting the superblock too early or, more likely, need better specialization.

Regarding jumps:

  • Unconditional jumps should be nops, simply relying on SAVE_IP to update the externally visible state
  • Conditional jumps should be converted to conditional exits and unconditional jumps.

@gvanrossum gvanrossum left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Thanks, this gave me food for thought.

@markshannon markshannon left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Handling branches is a bit tricky, so it's OK to take a bit of time to get this right.

@gvanrossum gvanrossum marked this pull request as draft July 5, 2023 21:19
@gvanrossum

gvanrossum commented Jul 6, 2023

Copy link
Copy Markdown
Member Author

Drat, the super-instruction refactor is broken. But 1868f91 works great!

I'll debug later.

@markshannon

Copy link
Copy Markdown
Member

I think it might be worth breaking up this PR into 3:

  • The fix in handling errors in _PyOptimizer_BackEdge
  • Handling of superinstructions
  • Jumps.

The handling of jumps has some subtleties, so we might as well get the other changes in while we refine the design of jump handling.

@gvanrossum

Copy link
Copy Markdown
Member Author

I think it might be worth breaking up this PR into 3:

  • The fix in handling errors in _PyOptimizer_BackEdge
  • Handling of superinstructions
  • Jumps.

The handling of jumps has some subtleties, so we might as well get the other changes in while we refine the design of jump handling.

I might do that, though I suspect there are mute lurking bugs that we will only find by attempting to implement jumps. Also, jumps will be an iterative design anyway.

And where do you see handling EXTENDED_ARG fitting in?

@markshannon

Copy link
Copy Markdown
Member

And where do you see handling EXTENDED_ARG fitting in?

In another PR as well. I much prefer lots of small PRs. Large, draft PRs tend to block other progress.

@gvanrossum

Copy link
Copy Markdown
Member Author

In another PR as well. I much prefer lots of small PRs. Large, draft PRs tend to block other progress.

Will do. I wonder though -- is a draft PR more of a problem than a private branch? If you'd rather have me do more of the latter I'm fine with that, but draft PRs have the advantage that they run CI.

gvanrossum added a commit that referenced this pull request Jul 6, 2023
When `_PyOptimizer_BackEdge` returns `NULL`, we should restore `next_instr` (and `stack_pointer`). To accomplish this we should jump to `resume_with_error` instead of just `error`.

The problem this causes is subtle -- the only repro I have is in PR gh-106393, at commit d7df54b. But the fix is real (as shown later in that PR).

While we're at it, also improve the debug output: the offsets at which traces are identified are now measured in bytes, and always show the start offset. This makes it easier to correlate executor calls with optimizer calls, and either with `dis` output.

<!-- gh-issue-number: gh-104584 -->
* Issue: gh-104584
<!-- /gh-issue-number -->
@gvanrossum

Copy link
Copy Markdown
Member Author

I pushed a leaner version, but without any of the significant changes to the branch strategy, so it's still a draft. Everything else has been split out or is being split out.

@gvanrossum gvanrossum force-pushed the jump-uops branch 2 times, most recently from 8d07517 to c87f9d6 Compare July 7, 2023 04:07
Introduce a new macro, JUMP_POP_DISPATCH(x, n).
This does JUMPBY(x), STACK_SHRINK(n), DISPATCH().
Most JUMP opcodes can use this.
The exceptions are SEND, JUMP_BACKWARD, and JUMP_BACKWARD_NO_INTERRUPT.
For JUMP_BACKWARD, I have to research whether CHECK_EVAL_BREAKER() and JUMPBY() commute.
I think I'll just punt on SEND (it's too complex anyways).
This involves some subtle rearrangement of code in JUMP_BACKWARD.
This may destroy the symmetry or slow things down,
but (for now) it's needed so that the executor can
at least avoid bailing when the jump is not taken.
(The original code was doing a jump 0 in that case.)
If JUMP_BACKWARD jumps to the start of the trace, add this.
It contains an eval breaker check.
@gvanrossum

Copy link
Copy Markdown
Member Author

Okay, I'll close this. I'll create a new issue describing how I think branches could be handled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants