◐ Shell
reader mode source ↗
Skip to content

bpo-46564: Optimize super().meth() calls via adaptive superinstructions#30992

Closed
Fidget-Spinner wants to merge 11 commits into
python:mainfrom
Fidget-Spinner:zero_cost_super
Closed

bpo-46564: Optimize super().meth() calls via adaptive superinstructions#30992
Fidget-Spinner wants to merge 11 commits into
python:mainfrom
Fidget-Spinner:zero_cost_super

Conversation

@Fidget-Spinner

@Fidget-Spinner Fidget-Spinner commented Jan 28, 2022

Copy link
Copy Markdown
Member

They should now have almost no overhead over a corresponding self.meth() call.

Summary of changes:

  • typeobject.c -- refactoring to reuse code during specialization, also use InterpreterFrame over PyFrameObject for lazy frame benefits. Some changes here are partially taken from bpo-43563 : Introduce dedicated opcodes for super calls #24936. All credits to @vladima (I've tried to properly include them in the news item too.)
  • specialize.c -- specialize for the 0-argument and 2-argument form of super().
  • ceval.c -- does both a CALL and LOAD_METHOD without intermediates (and both are specialized forms too).

TODO:
benchmarks!

https://bugs.python.org/issue46564

@markshannon markshannon self-assigned this Jan 28, 2022
@Fidget-Spinner Fidget-Spinner marked this pull request as draft January 28, 2022 18:14
@Fidget-Spinner

Copy link
Copy Markdown
Member Author

Marking as draft as I need make this work with the new CALL convention.

@markshannon

Copy link
Copy Markdown
Member

Maybe we should merge #31002 first, as that PR is simpler.
It would also allow us to compare the performance of just the specialization, without the removal of frame object allocation.

@Fidget-Spinner

Copy link
Copy Markdown
Member Author

Maybe we should merge #31002 first, as that PR is simpler. It would also allow us to compare the performance of just the specialization, without the removal of frame object allocation.

👍

@Fidget-Spinner

Copy link
Copy Markdown
Member Author

Mark, I'm going to run benchmarks on deltablue first since it uses 2-argument form super. I'll address your optimization ideas for the 0-arg form once those results come back. Fingers crossed.

@arhadthedev arhadthedev left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

A couple of indentation-related inconsistencies:

@Fidget-Spinner

Copy link
Copy Markdown
Member Author

Mark, I'm going to run benchmarks on deltablue first since it uses 2-argument form super. I'll address your optimization ideas for the 0-arg form once those results come back. Fingers crossed.

Well that was depressing. deltablue only shows 1.03x speedup. Looking closer at the code, super isn't called in any tight loops so that might be why. Maybe I need to pull out microbenchmarks now.

@Fidget-Spinner

Fidget-Spinner commented Feb 1, 2022

Copy link
Copy Markdown
Member Author

Microbenchmarks show that super() has sped up by more than 2.2x. This is faster than that other attempt because there's also speedups from the LOAD_METHOD_CACHED:
(Extremely unscientific, I'm short on time to set up pyperf right now)

import timeit

setup = """
class A:
    def f(self): pass
class B(A):
    def g(self): super().f()
    def h(self): self.f()

b = B()
"""

# super() call
print(timeit.timeit("b.g()", setup=setup, number=20_000_000))
# reference
print(timeit.timeit("b.h()", setup=setup, number=20_000_000))

Results:

# Main
5.796037399995839
2.4094066999969073

# This branch
2.4578273000006448
2.3718886000060593

So super().meth() is now only ~10% slowly than the corresponding self.meth() call whereas it was nearly 2x as slow previously. If I manage to incorporate your suggestions correctly, this will effectively just be a competition between LOAD_GLOBAL_BUILTIN (super) and LOAD_FAST (self).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting core review performance Performance or resource usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants