bpo-45510: Specialize BINARY_SUBTRACT by corona10 · Pull Request #29010 · python/cpython

bedevere-bot

🤖 New build scheduled with the buildbot fleet by @corona10 for commit fb360a2b562300ac17a10c49b43e0f496f51a0ee 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

corona10 changed the title ~~[WIP + Experiment] Implement Specializing BINARY_SUBTRACT~~ [Experiment] Implement Specializing BINARY_SUBTRACT

Oct 18, 2021

corona10 changed the title ~~[Experiment] Implement Specializing BINARY_SUBTRACT~~ bpo-45510: Specialize BINARY_SUBTRACT

Oct 18, 2021

corona10 marked this pull request as ready for review

October 18, 2021 11:19

🤖 New build scheduled with the buildbot fleet by @corona10 for commit b4608cca7dccaf21358e4da854d0701cbe0d0e01 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

The reason of build bot failure looks same with #29024
So unrelated with this PR.

Sorry about the conflicts. I wanted to get both the CALL_FUNCTION specializations PRs merged.

Sorry about the conflicts. I wanted to get both the CALL_FUNCTION specializations PRs merged.

I can do this all day :)

A marginal speedup
Note that the top three speedups are probably just random variations.

My concern with this change is that it will slow down any subtraction on anything other than pairs of ints and floats.
I'd like to see some microbenchmarks for int - float, float - int, and decimal - decimal.

Note that BINARY_ADD has the same issue. If we can fix that, then we could use the same approach here.

A marginal speedup
Note that the top three speedups are probably just random variations.

My concern with this change is that it will slow down any subtraction on anything other than pairs of ints and floats.
I'd like to see some microbenchmarks for int - float, float - int, and decimal - decimal.

Note that BINARY_ADD has the same issue. If we can fix that, then we could use the same approach here.

I share the same concerns

@markshannon cc @pablogsal @Fidget-Spinner

My concern with this change is that it will slow down any subtraction on anything other than pairs of ints and floats.

I definitely agree with your concern. So do I, At first, I thought that it's okayish since we already have the issue with BINARY_ADD. So if this change is not worth to applied, let's stale the PR until the issue is solved.

I'd like to see some microbenchmarks for int - float, float - int, and decimal - decimal.

Whether merge this PR or not, it's worth measuring such benchmarks, we can get some data from this experiment.
I will share the result after writing some scripts.

@markshannon For your data

Benchmark	specialize_baseline	specialize
int - int	30.1 ns	26.5 ns: 1.13x faster
float - float	24.1 ns	20.2 ns: 1.19x faster
int - float	37.6 ns	36.6 ns: 1.03x faster
decimal - decimal	63.6 ns	71.1 ns: 1.12x slower
Geometric mean	(ref)	1.04x faster

Benchmark hidden because not significant (1): float - int

from pyperf import Runner

runner = Runner()

runner.timeit("int - int",
    setup="""
import random
a = random.randint(1, 1000000)
b = random.randint(1, 1000000)
""",
    stmt="""
x = a - b
""")
runner.timeit("float - float",
    setup="""
import random
a = random.uniform(1.5, 3.9)
b = random.uniform(1.5, 3.9)
""",
    stmt="""
x = a - b
"""
    )
runner.timeit("int - float",
    setup="""
import random
a = random.randint(1, 100000)
b = random.uniform(1.5, 3.9)
""",
    stmt="""
x = a - b
"""
    )
runner.timeit("float - int",
    setup="""
import random
a = random.randint(1, 100000)
b = random.uniform(1.5, 3.9)
""",
    stmt="""
x = b - a
"""
    )

runner.timeit("decimal - decimal",
    setup="""
import random
from decimal import Decimal as D
a = D(random.randint(1, 100000))
b = D(random.randint(1, 100000))
""",
    stmt="""
x = a - b
"""
    )

@markshannon

Experiment 2

concept: If either left or right is float handle as both float.
BINARY_SUBTRACT_INT + BINARY_SUBTRACT_FLOAT = BINARY_SUBTRACT_FAST
This implementation is a kind of workaround so you may not want but if we only handle (int, float) variation, it will work.
Need to handle PyFloat_FromDouble is failed and stats handling is needed(This is just PoC)
c492437

Benchmark	specialize_baseline	specialize2
int - int	30.1 ns	27.5 ns: 1.09x faster
float - float	24.1 ns	20.7 ns: 1.16x faster
int - float	37.6 ns	22.5 ns: 1.67x faster
decimal - decimal	63.6 ns	67.9 ns: 1.07x slower
Geometric mean	(ref)	1.15x faster

Benchmark hidden because not significant (1): float - int

Unreliable my VM test with experiment2. :)

Benchmark	subtract_baseline	subtract_fast
2to3	421 ms	412 ms: 1.02x faster
chaos	112 ms	104 ms: 1.07x faster
deltablue	6.87 ms	6.69 ms: 1.03x faster
fannkuch	575 ms	583 ms: 1.01x slower
hexiom	9.91 ms	9.79 ms: 1.01x faster
json_dumps	18.0 ms	17.5 ms: 1.03x faster
logging_format	12.7 us	11.5 us: 1.11x faster
logging_simple	11.4 us	10.2 us: 1.12x faster
meteor_contest	141 ms	147 ms: 1.04x slower
nqueens	121 ms	117 ms: 1.04x faster
pathlib	34.3 ms	30.3 ms: 1.13x faster
pickle	15.2 us	14.6 us: 1.04x faster
pickle_list	5.46 us	5.36 us: 1.02x faster
python_startup	14.3 ms	14.1 ms: 1.01x faster
python_startup_no_site	9.74 ms	9.50 ms: 1.02x faster
raytrace	482 ms	450 ms: 1.07x faster
regex_compile	212 ms	201 ms: 1.05x faster
regex_dna	261 ms	254 ms: 1.03x faster
regex_effbot	4.17 ms	4.05 ms: 1.03x faster
regex_v8	32.0 ms	32.5 ms: 1.02x slower
scimark_sor	218 ms	212 ms: 1.03x faster
spectral_norm	154 ms	152 ms: 1.01x faster
telco	8.63 ms	8.24 ms: 1.05x faster
unpack_sequence	63.6 ns	60.9 ns: 1.04x faster
unpickle	20.9 us	19.3 us: 1.08x faster
unpickle_pure_python	423 us	375 us: 1.13x faster
xml_etree_parse	221 ms	210 ms: 1.05x faster
xml_etree_iterparse	153 ms	150 ms: 1.02x faster
xml_etree_generate	118 ms	115 ms: 1.03x faster
xml_etree_process	86.0 ms	82.6 ms: 1.04x faster
Geometric mean	(ref)	1.03x faster

Benchmark hidden because not significant (16): float, go, json_loads, logging_silent, nbody, pickle_dict, pickle_pure_python, pidigits, pyflate, richards, scimark_fft, scimark_lu, scimark_monte_carlo, scimark_sparse_mat_mult, sqlite_synth, unpickle_list

Will need reworking once #29482 is merged. Sorry to make you redo this, yet again 🙁

I think the end result will be better though, as we won't need yet another ADAPTIVE instruction.

@markshannon
#29482 looks great approach, I will create the new PR once the #29482 is merged.
Thanks for letting me know.

Should this PR be closed (in favor of #29523)?

corona10 deleted the BINARY_SUBTRACT_specialize branch

February 15, 2024 01:40