◐ Shell
clean mode source ↗

bpo-45508: Specialize INPLACE_ADD by sweeneyde · Pull Request #29024 · python/cpython

I merged with main and ran some microbenchmarks, and it seems decimal += decimal is not that bad.

Microbenchmark program:

from pyperf import Runner

runner = Runner()

runner.timeit("int+=int",
    setup="from itertools import repeat",
    stmt="x = 0\n"
         "for y in repeat(1, 10_000):\n"
         "    x += y; x += y; x += y; x += y; x += y"
    )
runner.timeit("float+=float",
    setup="from itertools import repeat",
    stmt="x = 0.0\n"
         "for y in repeat(1.0, 10_000):\n"
         "    x += y; x += y; x += y; x += y; x += y"
    )
runner.timeit("str+=str",
    setup="from itertools import repeat",
    stmt="for y in repeat('a', 10_000):\n"
         "    x = ''; x += y; x += y; x += y; x += y; x += y"
    )
runner.timeit("list[0]+=str",
    setup="from itertools import repeat",
    stmt="x = [None]\n"
         "for y in repeat('a', 10_000):\n"
         "    x[0] = ''; x[0] += y; x[0] += y; x[0] += y; x[0] += y; x[0] += y"
    )
runner.timeit("float+=int",
    setup="from itertools import repeat",
    stmt="x = 0.0\n"
         "for y in repeat(1, 10_000):\n"
         "    x += y; x += y; x += y; x += y; x += y"
    )
runner.timeit("decimal+=decimal",
    setup="from itertools import repeat; from decimal import Decimal as D",
    stmt="x = D(0)\n"
         "for y in repeat(D(1), 10_000):\n"
         "    x += y; x += y; x += y; x += y; x += y"
    )
runner.timeit("list[0]+=1",
    setup="from itertools import repeat; from collections import defaultdict",
    stmt="dd = [0]\n"
         "for y in repeat(1, 10_000):\n"
         "    dd[0] += y; dd[0] += y; dd[0] += y; dd[0] += y; dd[0] += y",
    )
runner.timeit("defaultdict(int)[0]+=1",
    setup="from itertools import repeat; from collections import defaultdict",
    stmt="dd = defaultdict(int)\n"
         "for y in repeat(1, 10_000):\n"
         "    dd[0] += y; dd[0] += y; dd[0] += y; dd[0] += y; dd[0] += y",
    )

Results from PGO on MSVC:

Benchmark main_inplace_add_micro specialized_inplace_add_micro
float+=float 1.18 ms 988 us: 1.19x faster
int+=int 1.34 ms 1.16 ms: 1.15x faster
str+=str 2.08 ms 1.80 ms: 1.15x faster
list[0]+=1 2.47 ms 2.39 ms: 1.03x faster
list[0]+=str 3.62 ms 3.67 ms: 1.01x slower
defaultdict(int)[0]+=1 3.29 ms 3.43 ms: 1.04x slower
float+=int 1.64 ms 1.74 ms: 1.06x slower
Geometric mean (ref) 1.05x faster

Benchmark hidden because not significant (1): decimal+=decimal

Results from PGO on GCC (WSL):

Benchmark main_inplace_add_micro2 specialized_inplace_add_micro2
float+=float 885 us 672 us: 1.32x faster
int+=int 1.14 ms 987 us: 1.16x faster
defaultdict(int)[0]+=1 2.61 ms 2.49 ms: 1.05x faster
list[0]+=1 2.03 ms 1.96 ms: 1.03x faster
float+=int 1.36 ms 1.32 ms: 1.03x faster
str+=str 1.39 ms 1.45 ms: 1.04x slower
list[0]+=str 2.75 ms 2.89 ms: 1.05x slower
Geometric mean (ref) 1.06x faster

Benchmark hidden because not significant (1): decimal+=decimal