◐ Shell
clean mode source ↗

GH-100982: Break up `COMPARE_AND_BRANCH` by brandtbucher · Pull Request #102801 · python/cpython

COMPARE_AND_BRANCH (and its specializations) are a bit weird. It's basically an "adaptive superinstruction" that's present in co_code, and its semantics and lifecycle are different from that of any other instruction. Not only can it be harder to reason about, but it also makes changes (like quickening in the compiler) more awkward.

However, it appears to be only a small improvement over COMPARE_OP. This branch replaces the COMPARE_AND_BRANCH family with a "normal" COMPARE_OP one; here are some microbenchmarks for specializations with and without conditional jumps (non-debug, non-PGO):

main:

$ ./python -m timeit -s "x = 0.0" "x == x"
20000000 loops, best of 5: 12.7 nsec per loop
$ ./python -m timeit -s "x = 0" "x == x"
20000000 loops, best of 5: 11 nsec per loop
$ ./python -m timeit -s "x = '0'" "x == x"
20000000 loops, best of 5: 11.2 nsec per loop
$ ./python -m timeit -s "x = 0.0" "None if x == x else None"
20000000 loops, best of 5: 10.8 nsec per loop
$ ./python -m timeit -s "x = 0" "None if x == x else None"
20000000 loops, best of 5: 11.4 nsec per loop
$ ./python -m timeit -s "x = '0'" "None if x == x else None"
20000000 loops, best of 5: 11.3 nsec per loop

This branch:

$ ./python -m timeit -s "x = 0.0" "x == x"
50000000 loops, best of 5: 8.76 nsec per loop
$ ./python -m timeit -s "x = 0" "x == x"
50000000 loops, best of 5: 8.89 nsec per loop
$ ./python -m timeit -s "x = '0'" "x == x"
50000000 loops, best of 5: 8.83 nsec per loop
$ ./python -m timeit -s "x = 0.0" "None if x == x else None"
20000000 loops, best of 5: 11.6 nsec per loop
$ ./python -m timeit -s "x = 0" "None if x == x else None"
20000000 loops, best of 5: 11.8 nsec per loop
$ ./python -m timeit -s "x = '0'" "None if x == x else None"
20000000 loops, best of 5: 12 nsec per loop

So COMPARE_AND_BRANCH only gets us a <1ns improvement in the (common) branching case, and it costs us about 2-4ns in the (uncommon) non-branching case. Honestly, I don't think that's enough of a win.

(Benchmarks are slightly slower, but in the noise.)