◐ Shell
reader mode source ↗
Skip to content

bpo-46939: Specialize calls to Python classes#31707

Closed
Fidget-Spinner wants to merge 16 commits into
python:mainfrom
Fidget-Spinner:specialize_py_class_calls
Closed

bpo-46939: Specialize calls to Python classes#31707
Fidget-Spinner wants to merge 16 commits into
python:mainfrom
Fidget-Spinner:specialize_py_class_calls

Conversation

@Fidget-Spinner

@Fidget-Spinner Fidget-Spinner commented Mar 6, 2022

Copy link
Copy Markdown
Member

CALL_X supporting __init__ inlining:

  • CALL_PY_EXACT_ARGS
  • CALL_PY_WITH_DEFAULTS
  • CALL

Micro benchmark using pyperf (Windows, PGO):

-m pyperf timeit -s "
>> class A:
>>  def __init__(self, a):
>>   self.a =a
>> " "A(1)" -o ..\pyperf-output\spec_py_class.json

Mean +- std dev: [spec_py_class_main] 159 ns +- 5 ns -> [spec_py_class] 106 ns +- 3 ns: 1.51x faster

https://bugs.python.org/issue46939

@Fidget-Spinner

Copy link
Copy Markdown
Member Author

I nearly went insane writing and debugging this. The general idea follows Mark's comments in faster-cpython/ideas#267 (comment). However, since we're specializing __init__, we need to return self instead of the None that __init__ normally returns. This required a significant amount of hackery.

@Fidget-Spinner Fidget-Spinner left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

I've added some comments on things I've tested against and had to debug. For reviewers in case some parts aren't so clear.

@brandtbucher

Copy link
Copy Markdown
Member

Nice! Haven't had a chance to review this yet (I can probably get to it tomorrow, though).

Unfortunately, I have another, larger PR open that heavily conflicts with this one. Is it okay if this waits until #31709 is merged? The caching in this PR will need to be reworked a bit to use the new inline caching mechanism, but it shouldn't be too difficult.

@brandtbucher brandtbucher self-requested a review March 7, 2022 00:49
@markshannon

Copy link
Copy Markdown
Member

This feels a bit fragile to me, but it is an interesting alternative to my approach of pushing an additional cleanup frame (https://github.com/python/cpython/compare/main...faster-cpython:specialize-calls-to-normal-python-classes?expand=1)

This PR should be faster for calls to Python classes, but the extra frame field will have a cost for calls to Python functions.

@Fidget-Spinner

Fidget-Spinner commented Mar 7, 2022

Copy link
Copy Markdown
Member Author

This feels a bit fragile to me

Indeed, hence why I nearly went insane :). One very easy way to trip over is that all new CALL_PY_* opcodes must also handle this special case, like how the rest are currently doing so. I've added a CALL_PY_FRAME_PASS_SELF() macro so that it's more obvious to people. Most of our test suite will crash (I've tried this) if you forget to add it. Python class creation is an integral part of Python after all.

P.S. Do your benchmarks say anything? I'm really sorry but right now I can't benchmark.

P.P.S. That's an interesting approach that I can't currently wrap my head around. Sorry if I created any duplicate work.

@Fidget-Spinner

Fidget-Spinner commented Mar 8, 2022

Copy link
Copy Markdown
Member Author

I've updated the first comment with a micro benchmark using pyperf on Windows with PGO. The results show 50% speedup for class creation over main.

@Fidget-Spinner

Copy link
Copy Markdown
Member Author

Pyperformance shows a speedup in object-heavy workloads. 1% speedup on average:

Slower (10):
- fannkuch: 603 ms +- 6 ms -> 621 ms +- 4 ms: 1.03x slower
- sqlite_synth: 3.39 us +- 0.06 us -> 3.49 us +- 0.13 us: 1.03x slower
- unpickle_list: 7.42 us +- 0.15 us -> 7.63 us +- 0.15 us: 1.03x slower
- mako: 17.7 ms +- 0.1 ms -> 18.1 ms +- 0.1 ms: 1.02x slower
- logging_simple: 8.91 us +- 0.12 us -> 9.09 us +- 0.14 us: 1.02x slower
- django_template: 55.7 ms +- 0.7 ms -> 56.8 ms +- 2.0 ms: 1.02x slower
- pickle: 15.9 us +- 0.5 us -> 16.1 us +- 0.2 us: 1.02x slower
- pidigits: 297 ms +- 1 ms -> 302 ms +- 1 ms: 1.01x slower
- logging_format: 10.1 us +- 0.2 us -> 10.2 us +- 0.2 us: 1.01x slower
- unpickle: 19.9 us +- 0.3 us -> 20.1 us +- 0.2 us: 1.01x slower

Faster (20):
- raytrace: 486 ms +- 4 ms -> 448 ms +- 3 ms: 1.08x faster
- chaos: 113 ms +- 1 ms -> 104 ms +- 1 ms: 1.08x faster
- float: 126 ms +- 1 ms -> 119 ms +- 2 ms: 1.06x faster
- scimark_lu: 193 ms +- 6 ms -> 184 ms +- 1 ms: 1.05x faster
- scimark_sor: 191 ms +- 2 ms -> 184 ms +- 2 ms: 1.04x faster
- chameleon: 11.3 ms +- 0.2 ms -> 10.9 ms +- 0.1 ms: 1.03x faster
- deltablue: 6.57 ms +- 0.07 ms -> 6.38 ms +- 0.06 ms: 1.03x faster
- telco: 10.4 ms +- 0.4 ms -> 10.1 ms +- 0.2 ms: 1.03x faster
- dulwich_log: 123 ms +- 1 ms -> 119 ms +- 1 ms: 1.03x faster
- scimark_fft: 538 ms +- 12 ms -> 523 ms +- 3 ms: 1.03x faster
- json_dumps: 19.6 ms +- 0.3 ms -> 19.2 ms +- 0.2 ms: 1.02x faster
- pathlib: 30.9 ms +- 0.4 ms -> 30.1 ms +- 0.5 ms: 1.02x faster
- nbody: 150 ms +- 2 ms -> 147 ms +- 2 ms: 1.02x faster
- tornado_http: 207 ms +- 4 ms -> 203 ms +- 3 ms: 1.02x faster
- go: 232 ms +- 2 ms -> 228 ms +- 2 ms: 1.02x faster
- unpack_sequence: 75.2 ns +- 0.9 ns -> 74.0 ns +- 0.7 ns: 1.02x faster
- pyflate: 711 ms +- 9 ms -> 700 ms +- 6 ms: 1.02x faster
- regex_effbot: 5.32 ms +- 0.06 ms -> 5.24 ms +- 0.06 ms: 1.02x faster
- regex_compile: 224 ms +- 1 ms -> 221 ms +- 1 ms: 1.01x faster
- pickle_list: 7.18 us +- 0.07 us -> 7.10 us +- 0.08 us: 1.01x faster

Benchmark hidden because not significant (27): 2to3, crypto_pyaes, hexiom, html5lib, json_loads, logging_silent, meteor_contest, nqueens, pickle_dict, pickle_pure_python, python_startup, python_startup_no_site, regex_dna, regex_v8, richards, scimark_monte_carlo, scimark_sparse_mat_mult, spectral_norm, sympy_expand, sympy_integrate, sympy_sum, sympy_str, unpickle_pure_python, xml_etree_parse, xml_etree_iterparse, xml_etree_generate, xml_etree_process

Geometric mean: 1.01x faster

@brandtbucher

Copy link
Copy Markdown
Member

Should this be closed in favor of #99331?

@brandtbucher brandtbucher removed their request for review November 18, 2022 21:17
@Fidget-Spinner

Copy link
Copy Markdown
Member Author

I'll close it when that one's merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants