bpo-45510: Specialize BINARY_SUBTRACT by corona10 · Pull Request #29010 · python/cpython
🤖 New build scheduled with the buildbot fleet by @corona10 for commit fb360a2b562300ac17a10c49b43e0f496f51a0ee 🤖
If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.
corona10
changed the title
[WIP + Experiment] Implement Specializing BINARY_SUBTRACT
[Experiment] Implement Specializing BINARY_SUBTRACT
corona10
changed the title
[Experiment] Implement Specializing BINARY_SUBTRACT
bpo-45510: Specialize BINARY_SUBTRACT
corona10
marked this pull request as ready for review
🤖 New build scheduled with the buildbot fleet by @corona10 for commit b4608cca7dccaf21358e4da854d0701cbe0d0e01 🤖
If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.
The reason of build bot failure looks same with #29024
So unrelated with this PR.
Sorry about the conflicts. I wanted to get both the CALL_FUNCTION specializations PRs merged.
I can do this all day :)
A marginal speedup
Note that the top three speedups are probably just random variations.
My concern with this change is that it will slow down any subtraction on anything other than pairs of ints and floats.
I'd like to see some microbenchmarks for int - float, float - int, and decimal - decimal.
Note that BINARY_ADD has the same issue. If we can fix that, then we could use the same approach here.
A marginal speedup
Note that the top three speedups are probably just random variations.My concern with this change is that it will slow down any subtraction on anything other than pairs of ints and floats.
I'd like to see some microbenchmarks forint - float,float - int, anddecimal - decimal.Note that
BINARY_ADDhas the same issue. If we can fix that, then we could use the same approach here.
I share the same concerns
@markshannon cc @pablogsal @Fidget-Spinner
My concern with this change is that it will slow down any subtraction on anything other than pairs of ints and floats.
I definitely agree with your concern. So do I, At first, I thought that it's okayish since we already have the issue with BINARY_ADD. So if this change is not worth to applied, let's stale the PR until the issue is solved.
I'd like to see some microbenchmarks for int - float, float - int, and decimal - decimal.
Whether merge this PR or not, it's worth measuring such benchmarks, we can get some data from this experiment.
I will share the result after writing some scripts.
@markshannon For your data
| Benchmark | specialize_baseline | specialize |
|---|---|---|
| int - int | 30.1 ns | 26.5 ns: 1.13x faster |
| float - float | 24.1 ns | 20.2 ns: 1.19x faster |
| int - float | 37.6 ns | 36.6 ns: 1.03x faster |
| decimal - decimal | 63.6 ns | 71.1 ns: 1.12x slower |
| Geometric mean | (ref) | 1.04x faster |
Benchmark hidden because not significant (1): float - int
from pyperf import Runner runner = Runner() runner.timeit("int - int", setup=""" import random a = random.randint(1, 1000000) b = random.randint(1, 1000000) """, stmt=""" x = a - b """) runner.timeit("float - float", setup=""" import random a = random.uniform(1.5, 3.9) b = random.uniform(1.5, 3.9) """, stmt=""" x = a - b """ ) runner.timeit("int - float", setup=""" import random a = random.randint(1, 100000) b = random.uniform(1.5, 3.9) """, stmt=""" x = a - b """ ) runner.timeit("float - int", setup=""" import random a = random.randint(1, 100000) b = random.uniform(1.5, 3.9) """, stmt=""" x = b - a """ ) runner.timeit("decimal - decimal", setup=""" import random from decimal import Decimal as D a = D(random.randint(1, 100000)) b = D(random.randint(1, 100000)) """, stmt=""" x = a - b """ )
Experiment 2
- concept: If either left or right is float handle as both float.
- BINARY_SUBTRACT_INT + BINARY_SUBTRACT_FLOAT = BINARY_SUBTRACT_FAST
- This implementation is a kind of workaround so you may not want but if we only handle (int, float) variation, it will work.
- Need to handle
PyFloat_FromDoubleis failed and stats handling is needed(This is just PoC) - c492437
| Benchmark | specialize_baseline | specialize2 |
|---|---|---|
| int - int | 30.1 ns | 27.5 ns: 1.09x faster |
| float - float | 24.1 ns | 20.7 ns: 1.16x faster |
| int - float | 37.6 ns | 22.5 ns: 1.67x faster |
| decimal - decimal | 63.6 ns | 67.9 ns: 1.07x slower |
| Geometric mean | (ref) | 1.15x faster |
Benchmark hidden because not significant (1): float - int
Unreliable my VM test with experiment2. :)
| Benchmark | subtract_baseline | subtract_fast |
|---|---|---|
| 2to3 | 421 ms | 412 ms: 1.02x faster |
| chaos | 112 ms | 104 ms: 1.07x faster |
| deltablue | 6.87 ms | 6.69 ms: 1.03x faster |
| fannkuch | 575 ms | 583 ms: 1.01x slower |
| hexiom | 9.91 ms | 9.79 ms: 1.01x faster |
| json_dumps | 18.0 ms | 17.5 ms: 1.03x faster |
| logging_format | 12.7 us | 11.5 us: 1.11x faster |
| logging_simple | 11.4 us | 10.2 us: 1.12x faster |
| meteor_contest | 141 ms | 147 ms: 1.04x slower |
| nqueens | 121 ms | 117 ms: 1.04x faster |
| pathlib | 34.3 ms | 30.3 ms: 1.13x faster |
| pickle | 15.2 us | 14.6 us: 1.04x faster |
| pickle_list | 5.46 us | 5.36 us: 1.02x faster |
| python_startup | 14.3 ms | 14.1 ms: 1.01x faster |
| python_startup_no_site | 9.74 ms | 9.50 ms: 1.02x faster |
| raytrace | 482 ms | 450 ms: 1.07x faster |
| regex_compile | 212 ms | 201 ms: 1.05x faster |
| regex_dna | 261 ms | 254 ms: 1.03x faster |
| regex_effbot | 4.17 ms | 4.05 ms: 1.03x faster |
| regex_v8 | 32.0 ms | 32.5 ms: 1.02x slower |
| scimark_sor | 218 ms | 212 ms: 1.03x faster |
| spectral_norm | 154 ms | 152 ms: 1.01x faster |
| telco | 8.63 ms | 8.24 ms: 1.05x faster |
| unpack_sequence | 63.6 ns | 60.9 ns: 1.04x faster |
| unpickle | 20.9 us | 19.3 us: 1.08x faster |
| unpickle_pure_python | 423 us | 375 us: 1.13x faster |
| xml_etree_parse | 221 ms | 210 ms: 1.05x faster |
| xml_etree_iterparse | 153 ms | 150 ms: 1.02x faster |
| xml_etree_generate | 118 ms | 115 ms: 1.03x faster |
| xml_etree_process | 86.0 ms | 82.6 ms: 1.04x faster |
| Geometric mean | (ref) | 1.03x faster |
Benchmark hidden because not significant (16): float, go, json_loads, logging_silent, nbody, pickle_dict, pickle_pure_python, pidigits, pyflate, richards, scimark_fft, scimark_lu, scimark_monte_carlo, scimark_sparse_mat_mult, sqlite_synth, unpickle_list
Will need reworking once #29482 is merged. Sorry to make you redo this, yet again 🙁
I think the end result will be better though, as we won't need yet another ADAPTIVE instruction.
@markshannon
#29482 looks great approach, I will create the new PR once the #29482 is merged.
Thanks for letting me know.
Should this PR be closed (in favor of #29523)?
corona10
deleted the
BINARY_SUBTRACT_specialize
branch