◐ Shell
clean mode source ↗

Branching design for Tier 2 (uops) interpreter

This issue is part of the larger epic of gh-104584. In PR gh-106393 I tried to implement branching, but it was premature. Here's a better design, following @markshannon's guidance.

We have the following jump instructions (not counting the instrumented versions):

Unconditional jumps:

  • JUMP_BACKWARD
  • JUMP_BACKWARD_NO_INTERRUPT
  • JUMP_FORWARD

Branches, a.k.a. conditional jumps:

  • POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE, POP_JUMP_IF_NONE, POP_JUMP_IF_NOT_NONE

  • FOR_ITER's specializations:

    • FOR_ITER_LIST
    • FOR_ITER_TUPLE
    • FOR_ITER_RANGE
  • FOR_ITER_GEN

  • SEND

  • Add counters to to POP_JUMP_IF_{FALSE,TRUE} to determine likeliness

The translation strategies could be as follows:

Unconditional jumps

JUMP_BACKWARD

  • If this jumps to exactly the top of the current trace, emit a Tier 2 JUMP_TO_TOP uop, and stop projecting (i.e., exit the trace generation loop). The JUMP_TO_TOP uop implementation should include a CHECK_EVAL_BREAKER call.
  • If this jumps anywhere else, emit a SAVE_IP uop with the destination of the jump, followed by an EXIT_TRACE uop, and stop projecting.

JUMP_BACKWARD_NO_INTERRUPT

  • Since this is typically only used in special circumstances, just emit a SAVE_IP instruction with the destination and an EXIT_TRACE uop, and stop projecting.
  • Alternatively, we could make CHECK_EVAL_BREAKER a separate UOP that is inserted for JUMP_BACKWARD but not for JUMP_BACKWARD_NO_INTERRUPT, and otherwise treat the two backward jumps the same.

JUMP_FORWARD

  • Emit a SAVE_IP uop with the destination of the jump, and continue projecting from there (i.e. set instr to the destination of the jump).

Conditional jumps (branches)

POP_JUMP_IF_FALSE and friends

Consider the following Python code:

This translates roughly to the following Tier 1 bytecode (using B1, B2, ... to label Tier 1 instructions, and <cond>, <block> etc. to represent code blocks that evaluate or execute the corresponding Python fragments):

B1: <cond>
B2: POP_JUMP_IF_FALSE B4
B3: <block>
B4: <rest>
B5:

I propose the following translation into Tier 2 uops, assuming the branch is "unlikely":

    SAVE_IP B1
    <cond>
    SAVE_IP B2
    JUMP_IF_FALSE stub
    POP_TOP
    SAVE_IP B3
    <block>
    SAVE_IP B4
    EXIT_TRACE

stub:
    POP_TOP
    SAVE_IP B4
    EXIT_TRACE

Where JUMP_IF_FALSE inspects the top of stack but doesn't pop it, and has an argument that executes a jump in the Tier 2 uop instruction sequence.

If the branch is "likely", we do this instead:

    SAVE_IP B1
    <cond>
    SAVE_IP B2
    JUMP_IF_TRUE stub
    POP_TOP
    SAVE_IP B4
    <rest>
    SAVE_IP B5
    EXIT_TRACE

stub:
    POP_TOP
    SAVE_IP B3
    EXIT_TRACE

Note how in this case, <rest> is projected as part of the trace, while <block> is not, since the likely case is that we jump over <block> to <rest>.

For the other simple conditional jumps (POP_JUMP_IF_TRUE, POP_JUMP_IF_NONE, POP_JUMP_IF_NOT_NONE) we do the same: if the jump is unlikely, emit a JUMP_IF_XXX uop and a stub; if the jump is likely, emit the inverse JUMP_IF_NOT_XXX uop and a different stub, and continue projecting at the destination of the original jump bytecode.

I propose to have hand-written cases both in the superblock generator and in the Tier 2 interpreter for these, since the translations are too irregular to fit easily in the macro expansion data structure. The stub generation will require additional logic and data structures in translate_bytecode_to_trace() to keep track of the stubs required so far, the available space for those, and the back-patching required to set the operands for the JUMP_IF_XXX uops.

FOR_ITER and (especially) its specializations

The common case for these is not to jump. The bytecode definitions are too complex to duplicate in hand-written Tier 2 uops. My proposal is to change these in bytecodes.c so that, instead of using the JUMPBY(n) macro, they use JUMPBY_POP_DISPATCH(n), which in Tier 1 translates into just JUMPBY(n), but in Tier 2 translates into roughly

frame->prev_instr += (x);
PY_DECREF(stack_pointer[-1]);
stack_pointer -= 1;
goto exit;

thereby exiting the trace when the corresponding for-loop terminates.

I am assuming here that most loops have several iterations. I don't think it's worth special-casing the occasional for-loop that almost always immediately terminates.

SEND

Possibly we could treat this the same as FOR_ITER. But right now I propose to just punt here, and when we encounter it, stop projecting, as we do with any other unsupported bytecode instruction.

Linked PRs