bpo-32492: 1.6x speed up in namedtuple attribute access using C fast-path#10495

pablogsal

Timing benchmarks

Attribute Access

import perf

runner = perf.Runner()
runner.timeit("a.x",
              stmt="a.x",
              setup="import collections;A=collections.namedtuple('A','x')")

./python -m perf compare_to old.json new.json -v
Mean +- std dev: [old] 280 ns +- 3 ns -> [new] 111 ns +- 1 ns: 2.52x faster

Apparently, there is a regression in the current master. This is the comparison against 3.7:

Mean +- std dev: [old] 176 ns +- 2 ns -> [new] 110 ns +- 2 ns: 1.61x faster (-38%)
Significant (t=177.69)

Creation

(Just to check that creation is not slower)

import perf

runner = perf.Runner()
runner.timeit("collections.namedtuple('A','x')",
              stmt="collections.namedtuple('A','x')",
              setup="import collections")

Mean +- std dev: [old_creation] 209 us +- 3 us -> [new_creation] 207 us +- 4 us: 1.01x faster

import perf

runner = perf.Runner()
runner.timeit("A(2324)",
              stmt="A(2324)",
              setup="import collections;A=collections.namedtuple('A','x')")

Mean +- std dev: [old_creation_obj] 1.41 us +- 0.03 us -> [new_creation_obj] 1.41 us +- 0.02 us: 1.00x faster (-0%)

Cache efficiency

Baseline

❯ perf stat -r 200 -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations ./python -c "
import collections
A = collections.namedtuple('A','x');a = A(42)
for _ in range(100):
    some_var = a.x
"""


 Performance counter stats for './python -c
import collections
A = collections.namedtuple('A','x');a = A(42)
for _ in range(100):
    some_var = a.x
' (200 runs):

         1,469,290      cache-references:u                                            ( +-  0.26% )
            20,240      cache-misses:u            #    1.378 % of all cache refs      ( +-  8.58% )
       146,812,273      cycles:u                                                      ( +-  0.24% )
       201,131,089      instructions:u            #    1.37  insn per cycle           ( +-  0.01% )
        40,257,360      branches:u                                                    ( +-  0.01% )
             1,175      faults:u                                                      ( +-  0.01% )
                 0      migrations:u

          0.050526 +- 0.000281 seconds time elapsed  ( +-  0.56% )

Patched

❯ perf stat -r 200 -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations ./python -c "
import collections
A = collections.namedtuple('A','x');a = A(42)
for _ in range(100):
    some_var = a.x
"""                                

 Performance counter stats for './python -c
import collections
A = collections.namedtuple('A','x');a = A(42)
for _ in range(100):
    some_var = a.x
' (200 runs):

         1,471,736      cache-references:u                                            ( +-  0.11% )
             7,196      cache-misses:u            #    0.489 % of all cache refs      ( +-  6.94% )
       145,004,120      cycles:u                                                      ( +-  0.07% )
       201,182,075      instructions:u            #    1.39  insn per cycle           ( +-  0.01% )
        40,219,107      branches:u                                                    ( +-  0.01% )
             1,174      faults:u                                                      ( +-  0.01% )
                 0      migrations:u

          0.048499 +- 0.000222 seconds time elapsed  ( +-  0.44% )

https://bugs.python.org/issue32492

…path

…C fast path

vstinner

Sorry, I didn't check your implementation, but did you consider to reuse existing structseq type to implement namedtuple? https://bugs.python.org/issue28638#msg298499 Last time I ran a microbenchmark, structseq was 1.9x faster than namedtuple to get an attribute by name.

In the meanwhile, I removed property_descr_get() micro-optimization because it wasn't correct and caused 3 different crashed, bpo-30156, commit e972c13. So I get that structseq is now even faster than namedtuple to get an attribute :-)

…the descriptor itself

pablogsal

Sorry, I didn't check your implementation, but did you consider to reuse existing structseq type to implement namedtuple? https://bugs.python.org/issue28638#msg298499 Last time I ran a microbenchmark, structseq was 1.9x faster than namedtuple to get an attribute by name.

Hummm I did not consider this, but that will involve more significant and fundamental changes than this Pull Request. Also, apparently there is this issue that Josh Rosenberg ran into when implementing the idea. I am happy to give it a go if people agree that is a good idea :) But I think this we can start with this Pull Request as is simpler and it gives some immediate speedup.

serhiy-storchaka

You do not need a subclass of property. You need just a descriptor.

Look also at __slots__ implementation.

pablogsal

@serhiy-storchaka Thanks! I will take a look into that. Independently, if we don't move the property object to the header file, is not possible to subclass property in C. What do you think we should do with that?

…only __new__

serhiy-storchaka

Try to make constructor arguments positional-only and repeat benchmarks for creating a namedtuple type. I think this can save several percents of creation time.

pablogsal

@serhiy-storchaka Here are the results (commit e5bca1d):

import perf

runner = perf.Runner()
runner.timeit("collections.namedtuple('A','x')", 
stmt="collections.namedtuple('A','x')", 
setup="import collections")

❯ ./python -m perf compare_to  ../cpython_baseline/old_creation.json new_creation.json
Mean +- std dev: [old_creation] 107 us +- 3 us -> [new_creation] 103 us +- 2 us: 1.03x faster (-4%)

rhettinger

This patch looks great. Thanks for the effort to get this done :-)

Before this gets committed, please make a couple of improvements.

1. The _tuplegetter() API needs to more fully emulate property():

>>> set(dir(property)) - set(dir(_tuplegetter))
{'__delete__', 'fdel', 'deleter', '__isabstractmethod__', 'setter', '__set__', 'getter', 'fget', 'fset'}

Part of the reason is that we want tuplegetter() to be a drop in substitute, supporting whatever interactions users have had with it before now (this is an old API). Another reason is that tuplegetter() needs to be recognized as a data descriptor so that its docstrings show-up in the output of help().

Formerly, running >>> help(namedtuple('Point', ['x', 'y'])(10, 20)) would produce:

 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  x
 |      Alias for field number 0
 |  
 |  y
 |      Alias for field number 1

Now we get:

 |  Methods defined here:
 |
 |  x = <_collections._tuplegetter object>
 |  y = <_collections._tuplegetter object>

2. The code in tuplegetterdescr_get can be made tighter by using PyTuple_GET_SIZE() and PyTuple_GET_ITEM() instead of PyTuple_GetItem(). That saves the function call overhead and a redundant duplicate PyTuple_Check (the second check is 100% branch predictable which is good, but still incurs two chained memory accesses).

In running timings, we should not only benchmark 1.6x to 2.5 improvement, but also compare against regular attribute access to an instance of a class that defines __slots__. Ideally tuplegetter() should be almost as fast as member objects since both do almost exactly the same work (indexing into a tuple should be only slightly slower than into slots).

serhiy-storchaka

To be recognized as a data descriptor tuplegetter needs to implement __set__.

I do not think that _tuplegetter should fully emulate property. It is enough if implement the common API of properties and data descriptors.

>>> sorted(set(dir(int.numerator)) - set(dir(_tuplegetter)))
['__delete__', '__name__', '__objclass__', '__qualname__', '__set__']
>>> class A: __slots__ = 'x'
... 
>>> sorted(set(dir(A.x)) - set(dir(_tuplegetter)))
['__delete__', '__name__', '__objclass__', '__qualname__', '__set__']

pablogsal

@rhettinger @serhiy-storchaka I am not sure what is the best path to follow. To make this PR simpler, what do you think about just reverting commit 1e14509 so tuplegetter inherits from property setting all the attributes. This "emulates" property but I cannot see any obvious downside and it makes the implementation much cleaner and (IMHO) maintainable. Manually implementing all the property methods here seems to me like raising a lot the maintenance burden, without mentioning future divergence with property.

@serhiy-storchaka Although tuplegetter would be fine just implementing the common API of data descriptors, people may be using the old properties in the namedtuple directly, accessing some particular fields that only properties have, so not implementing them may be a regression, right?

…on of tuplegetterdescr_get

serhiy-storchaka

I do not see sense in full emulating a property, and in any case your past versions did not do this.

Attributes setter and deleter are used only for defining the setter and the deleter in the class definition. _tuplegetter will not be used in such way.

__isabstractmethod__ does not make sense since _tuplegetter is not abstract.

fget, fset and fdel were not provided. In any case the user should not depend on such implementation detail. For getting a getter for the specific attribute they should use operator.attrgetter or trivial lambda.

We are at the pre-alpha stage. If some code will be broken by this change, we have enough time to fix it.

pablogsal

@serhiy-storchaka So you propose to implement:

['__delete__', '__name__', '__objclass__', '__qualname__', '__set__']

Is that correct?

serhiy-storchaka

Try to implement just __set__ and __delete__. If this is not enough for pydoc, implement more.

pablogsal

After db3ffcd:

>>> from collections import namedtuple
>>> help(namedtuple('Point', ['x', 'y'])(10, 20))

|  Data descriptors defined here:
|
|  x
|      Alias for field number 0
|
|  y
|      Alias for field number 1
|

>>> set(dir(property)) - set(dir(_tuplegetter))
{'fget', 'deleter', '__isabstractmethod__', 'getter', 'setter', 'fdel', 'fset'}

pablogsal

Benchmark agains a class definning __slots__ and tuples:

import perf

runner = perf.Runner()

runner.timeit("namedtuple",
        stmt="a.x",
        setup="""\
import collections
a = collections.namedtuple('A', ['x'])(3)
""")

runner.timeit("slots",
        stmt="b.x",
        setup="""\
class B:
    __slots__ = ("x",)

    def __init__(self, x):
        self.x = x
b = B(3)
""")

runner.timeit("tuple",
        stmt="b[0]",
        setup="""\
b = (3,)
""")

Results (no PGO):

./python ../experiment.py
.....................
namedtuple: Mean +- std dev: 34.7 ns +- 0.6 ns
.....................
slots: Mean +- std dev: 38.3 ns +- 1.8 ns
.....................
tuple: Mean +- std dev: 34.6 ns +- 0.2 ns

It turns that the latest _tuplegetter is 8% faster than __slots__ and basically the same as the tuple.

I ran some experiments regarding the inlining of PyTuple_GetItem and even without PGO is unoticeable under -O3 optimization. The x86 for the function call diff is:

        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], edi
        ...
        cmpq❘---%rdx, %rsi
        cmpq❘---%rax, 32(%rsi)

The two cmpq are smashed by the branch predictor and the stack allocation (the push and the two movs and subsequent) are almost negligible as the stack of tuplegetterdescr_get is reused. On the other hand, under O2 and less this changes and you can notice a small jitter in the benchmarks, so I think is a good idea to inline the call to PyTuple_GetItem as @rhettinger recommended.

rhettinger

FWIW, my timings show a significant improvement (more than 2x) and that named tuple attribute access is now on-par with access to member objects created by _slots_.

Nice work.

bedevere-bot

@rhettinger: Please replace # with GH- in the commit message next time. Thanks!

bpo-32492: 2.5x speed up in namedtuple attribute access using C fast … …

97c3ee5

…path

pablogsal self-assigned this Nov 13, 2018

pablogsal requested a review from rhettinger November 13, 2018 00:59

the-knights-who-say-ni added the CLA signed label Nov 13, 2018

bedevere-bot added the awaiting merge label Nov 13, 2018

pablogsal added 2 commits November 13, 2018 01:03

Add News entry

5dc78ff

fixup! bpo-32492: 2.5x speed up in namedtuple attribute access using … …

631ab2c

…C fast path

pablogsal commented Nov 13, 2018

View reviewed changes

serhiy-storchaka reviewed Nov 13, 2018

View reviewed changes

pablogsal changed the title ~~bpo-32492: 2.5x speed up in namedtuple attribute access using C fast-path~~ Nov 13, 2018

Check for tuple in the __get__ of the new descriptor and don't cache … …

21be735

…the descriptor itself

pablogsal force-pushed the bpo32492 branch from 7d1ef82 to 21be735 Compare November 13, 2018 10:53

Don't inherit from property. Implement GC methods to handle __doc__

1e14509

serhiy-storchaka reviewed Nov 15, 2018

View reviewed changes

methane reviewed Nov 15, 2018

View reviewed changes

pablogsal added 3 commits November 16, 2018 00:15

Add a test for the docstring substitution in descriptors

f9ca1e4

Update NEWS entry to reflect time against 3.7 branch

a6187b8

Simplify implementation with argument clinic, better error messages, … …

7d2dd84

…only __new__

rhettinger self-assigned this Nov 26, 2018

Use positional-only parameters for the __new__

e5bca1d

Use PyTuple_GET_SIZE and PyTuple_GET_ITEM to tighter the implementati… …

96eae4c

…on of tuplegetterdescr_get

serhiy-storchaka reviewed Dec 29, 2018

View reviewed changes

ZackerySpytz reviewed Dec 29, 2018

View reviewed changes

Implement __set__ to make tuplegetter a data descriptor

9838c39

pablogsal force-pushed the bpo32492 branch from db3ffcd to 9838c39 Compare December 29, 2018 19:28

pablogsal and others added 3 commits December 29, 2018 19:30

Use Py_INCREF now that we inline PyTuple_GetItem

2e350ed

Apply the valid_index() function, saving one test

c9772e8

Move Py_None test out of the critical path.

62cd7fd

rhettinger merged commit 3f5fc70 into python:master Dec 30, 2018

bedevere-bot removed the awaiting merge label Dec 30, 2018

pablogsal deleted the bpo32492 branch December 30, 2018 09:34

mtreinish mentioned this pull request Sep 15, 2021

[WIP] Encapsulate instruction in args in "Instruction" class Qiskit/qiskit#7020

Closed

3 tasks

Conversation

pablogsal commented Nov 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Timing benchmarks

Attribute Access

Creation

Cache efficiency

Baseline

Patched

Uh oh!

vstinner commented Nov 13, 2018 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pablogsal commented Nov 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka commented Nov 14, 2018

Uh oh!

pablogsal commented Nov 14, 2018

Uh oh!

serhiy-storchaka commented Nov 26, 2018

Uh oh!

pablogsal commented Nov 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhettinger commented Dec 23, 2018

Uh oh!

serhiy-storchaka commented Dec 23, 2018

Uh oh!

pablogsal commented Dec 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka commented Dec 28, 2018

Uh oh!

pablogsal commented Dec 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka commented Dec 28, 2018

Uh oh!

pablogsal commented Dec 29, 2018

Uh oh!

pablogsal commented Dec 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhettinger commented Dec 30, 2018

Uh oh!

bedevere-bot commented Dec 30, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

pablogsal commented Nov 13, 2018 •

edited

Loading

vstinner commented Nov 13, 2018 •

edited by bedevere-bot

Loading

pablogsal commented Nov 13, 2018 •

edited

Loading

pablogsal commented Nov 27, 2018 •

edited

Loading

pablogsal commented Dec 28, 2018 •

edited

Loading

pablogsal commented Dec 28, 2018 •

edited

Loading

pablogsal commented Dec 29, 2018 •

edited

Loading