bpo-45508: Specialize INPLACE_ADD by sweeneyde · Pull Request #29024 · python/cpython
I merged with main and ran some microbenchmarks, and it seems decimal += decimal is not that bad.
Microbenchmark program:
from pyperf import Runner runner = Runner() runner.timeit("int+=int", setup="from itertools import repeat", stmt="x = 0\n" "for y in repeat(1, 10_000):\n" " x += y; x += y; x += y; x += y; x += y" ) runner.timeit("float+=float", setup="from itertools import repeat", stmt="x = 0.0\n" "for y in repeat(1.0, 10_000):\n" " x += y; x += y; x += y; x += y; x += y" ) runner.timeit("str+=str", setup="from itertools import repeat", stmt="for y in repeat('a', 10_000):\n" " x = ''; x += y; x += y; x += y; x += y; x += y" ) runner.timeit("list[0]+=str", setup="from itertools import repeat", stmt="x = [None]\n" "for y in repeat('a', 10_000):\n" " x[0] = ''; x[0] += y; x[0] += y; x[0] += y; x[0] += y; x[0] += y" ) runner.timeit("float+=int", setup="from itertools import repeat", stmt="x = 0.0\n" "for y in repeat(1, 10_000):\n" " x += y; x += y; x += y; x += y; x += y" ) runner.timeit("decimal+=decimal", setup="from itertools import repeat; from decimal import Decimal as D", stmt="x = D(0)\n" "for y in repeat(D(1), 10_000):\n" " x += y; x += y; x += y; x += y; x += y" ) runner.timeit("list[0]+=1", setup="from itertools import repeat; from collections import defaultdict", stmt="dd = [0]\n" "for y in repeat(1, 10_000):\n" " dd[0] += y; dd[0] += y; dd[0] += y; dd[0] += y; dd[0] += y", ) runner.timeit("defaultdict(int)[0]+=1", setup="from itertools import repeat; from collections import defaultdict", stmt="dd = defaultdict(int)\n" "for y in repeat(1, 10_000):\n" " dd[0] += y; dd[0] += y; dd[0] += y; dd[0] += y; dd[0] += y", )
Results from PGO on MSVC:
| Benchmark | main_inplace_add_micro | specialized_inplace_add_micro |
|---|---|---|
| float+=float | 1.18 ms | 988 us: 1.19x faster |
| int+=int | 1.34 ms | 1.16 ms: 1.15x faster |
| str+=str | 2.08 ms | 1.80 ms: 1.15x faster |
| list[0]+=1 | 2.47 ms | 2.39 ms: 1.03x faster |
| list[0]+=str | 3.62 ms | 3.67 ms: 1.01x slower |
| defaultdict(int)[0]+=1 | 3.29 ms | 3.43 ms: 1.04x slower |
| float+=int | 1.64 ms | 1.74 ms: 1.06x slower |
| Geometric mean | (ref) | 1.05x faster |
Benchmark hidden because not significant (1): decimal+=decimal
Results from PGO on GCC (WSL):
| Benchmark | main_inplace_add_micro2 | specialized_inplace_add_micro2 |
|---|---|---|
| float+=float | 885 us | 672 us: 1.32x faster |
| int+=int | 1.14 ms | 987 us: 1.16x faster |
| defaultdict(int)[0]+=1 | 2.61 ms | 2.49 ms: 1.05x faster |
| list[0]+=1 | 2.03 ms | 1.96 ms: 1.03x faster |
| float+=int | 1.36 ms | 1.32 ms: 1.03x faster |
| str+=str | 1.39 ms | 1.45 ms: 1.04x slower |
| list[0]+=str | 2.75 ms | 2.89 ms: 1.05x slower |
| Geometric mean | (ref) | 1.06x faster |
Benchmark hidden because not significant (1): decimal+=decimal