bpo-32492: 1.6x speed up in namedtuple attribute access using C fast-path#10495
bpo-32492: 1.6x speed up in namedtuple attribute access using C fast-path#10495rhettinger merged 14 commits into
Conversation
|
Sorry, I didn't check your implementation, but did you consider to reuse existing structseq type to implement namedtuple? https://bugs.python.org/issue28638#msg298499 Last time I ran a microbenchmark, structseq was 1.9x faster than namedtuple to get an attribute by name. In the meanwhile, I removed property_descr_get() micro-optimization because it wasn't correct and caused 3 different crashed, bpo-30156, commit e972c13. So I get that structseq is now even faster than namedtuple to get an attribute :-) |
Sorry, something went wrong.
…the descriptor itself
Hummm I did not consider this, but that will involve more significant and fundamental changes than this Pull Request. Also, apparently there is this issue that Josh Rosenberg ran into when implementing the idea. I am happy to give it a go if people agree that is a good idea :) But I think this we can start with this Pull Request as is simpler and it gives some immediate speedup. |
Sorry, something went wrong.
|
You do not need a subclass of Look also at |
Sorry, something went wrong.
|
@serhiy-storchaka Thanks! I will take a look into that. Independently, if we don't move the property object to the header file, is not possible to subclass property in C. What do you think we should do with that? |
Sorry, something went wrong.
|
Try to make constructor arguments positional-only and repeat benchmarks for creating a namedtuple type. I think this can save several percents of creation time. |
Sorry, something went wrong.
|
@serhiy-storchaka Here are the results (commit e5bca1d): import perf
runner = perf.Runner()
runner.timeit("collections.namedtuple('A','x')",
stmt="collections.namedtuple('A','x')",
setup="import collections") |
Sorry, something went wrong.
|
This patch looks great. Thanks for the effort to get this done :-) Before this gets committed, please make a couple of improvements. 1. The _tuplegetter() API needs to more fully emulate property(): Part of the reason is that we want tuplegetter() to be a drop in substitute, supporting whatever interactions users have had with it before now (this is an old API). Another reason is that tuplegetter() needs to be recognized as a data descriptor so that its docstrings show-up in the output of help(). Formerly, running Now we get: 2. The code in tuplegetterdescr_get can be made tighter by using PyTuple_GET_SIZE() and PyTuple_GET_ITEM() instead of PyTuple_GetItem(). That saves the function call overhead and a redundant duplicate PyTuple_Check (the second check is 100% branch predictable which is good, but still incurs two chained memory accesses). In running timings, we should not only benchmark 1.6x to 2.5 improvement, but also compare against regular attribute access to an instance of a class that defines |
Sorry, something went wrong.
|
To be recognized as a data descriptor I do not think that |
Sorry, something went wrong.
|
@rhettinger @serhiy-storchaka I am not sure what is the best path to follow. To make this PR simpler, what do you think about just reverting commit 1e14509 so @serhiy-storchaka Although |
Sorry, something went wrong.
…on of tuplegetterdescr_get
|
I do not see sense in full emulating a Attributes
We are at the pre-alpha stage. If some code will be broken by this change, we have enough time to fix it. |
Sorry, something went wrong.
|
@serhiy-storchaka So you propose to implement:
Is that correct? |
Sorry, something went wrong.
|
Try to implement just |
Sorry, something went wrong.
|
After db3ffcd: >>> from collections import namedtuple
>>> help(namedtuple('Point', ['x', 'y'])(10, 20))
| Data descriptors defined here:
|
| x
| Alias for field number 0
|
| y
| Alias for field number 1
|
>>> set(dir(property)) - set(dir(_tuplegetter))
{'fget', 'deleter', '__isabstractmethod__', 'getter', 'setter', 'fdel', 'fset'} |
Sorry, something went wrong.
|
Benchmark agains a class definning Results (no PGO): It turns that the latest I ran some experiments regarding the inlining of PyTuple_GetItem and even without PGO is unoticeable under The two cmpq are smashed by the branch predictor and the stack allocation (the push and the two movs and subsequent) are almost negligible as the stack of |
Sorry, something went wrong.
|
FWIW, my timings show a significant improvement (more than 2x) and that named tuple attribute access is now on-par with access to member objects created by _slots_. Nice work. |
Sorry, something went wrong.
|
@rhettinger: Please replace |
Sorry, something went wrong.
Timing benchmarks
Attribute Access
Apparently, there is a regression in the current master. This is the comparison against 3.7:
Creation
(Just to check that creation is not slower)
Cache efficiency
Baseline
Patched
https://bugs.python.org/issue32492