◐ Shell
clean mode source ↗

GH-96421: Don't keep local copies of `first_instr`, `names` and `consts` in the interpreter. by markshannon · Pull Request #96847 · python/cpython

I'm struggling to get good performance numbers on this. An earlier run gave me a 4% speedup, which is clearly wrong.

However, we can look at the stats and deduce that this should speed things up.

first_instr is never hot and can be computed cheaply as frame->f_code + OFFSET. Caching it is pointless.

The reasoning for names and consts is a bit more complex:

  • Approx 7% of instructions change the frame, this PR saves four loads (and if names and consts are spllled, two stores)
  • About 2% of instructions use names, this PR costs two loads, or one load if names is spilled.
  • About 8% of instructions use consts, this PR costs two loads, or one load if consts is spilled.

The saving is small if names and consts are not spilled, but a compiler would be stupid not to spill names and consts on x86-64.
So on RISC would expect a very small speedup, but on x86-64 we would expect a larger one. I would estimate in the 0.3-0.6% range.

Even if this results in a small slowdown, it should speedup #96319 and increase the potential benefit of adding LOAD_CONST_IMMORTAL

Skipping news as this change is undetectable, even for a intrusive C extension.

Chesterton's Fence, a.k.a why were these values stored in local variables before?

These variables used to speed up CPython, even they no longer do:

  • first_instr: Many jumps were absolute, so first_instr was needed to make these jumps.
  • names: LOAD_ATTR and LOAD_GLOBAL need names. Thanks to PEP 659, these instructions (and their adaptive forms) are much less common
  • consts: The existence of POP_JUMP_IF_NONE reduced the number of LOAD_CONSTs a bit, and RETURN_GENERATOR has increased the number of frame entry and exits. This may have tipped the balance, or maybe caching of consts was misguided before.