gh-126868: Add freelist for compact int objects#126865
Conversation
|
I'm running this PR over pyperformance on our benchmarking hardware. It will take ~3 hours. |
Sorry, something went wrong.
Actually, scratch that -- I'll wait until the tests are passing here. That's required for PGO builds. |
Sorry, something went wrong.
|
Tests are passing now, but I disabled returning objects to the freelist at a couple of places. Changing |
Sorry, something went wrong.
|
What happens exactly when you change these lines: PyStackRef_CLOSE_SPECIALIZED(left, (destructor)PyObject_Free); // needs to be converted to freelistto ? |
Sorry, something went wrong.
|
@markshannon When I change the lines several tests related to refleaks in the CI are failing. For example test_no_memleak, which can be reproduced from the command line with: The output should be |
Sorry, something went wrong.
Fidget-Spinner
left a comment
There was a problem hiding this comment.
The "leaks" are due to the freelist itself. They are a sign the freelist is working :).
Can you please convert all to _PyLong_ExactDealloc and apply the suggestion below, and tell me how many allocations this removes?
Sorry, something went wrong.
|
@eendebakpt sorry I pushed to your branch as I'm really eager to get benchmark results on this :). |
Sorry, something went wrong.
|
On my machine using the benchmark script provided above (release build, no PGO, no LTO): |
Sorry, something went wrong.
|
Results:
This is a great result! Congrats and great work @eendebakpt ! |
Sorry, something went wrong.
picnixz
left a comment
There was a problem hiding this comment.
Out of curiosity, are compact integers allowed to be signed or not? (you mentioned that we only focus on single digit numbers but the C API considers compact objects as being an implementation detail IIRC). If not, how hard would it be to make them support free lists? (I don't have much knowledge in free lists; for instance it's a mystery to me for how the free list grows)
Sorry, something went wrong.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Sorry, something went wrong.
I assume you mean the small int checks in |
Sorry, something went wrong.
…e-126868.yOoHSY.rst
|
I have made the requested changes; please review again |
Sorry, something went wrong.
|
Thanks for making the requested changes! @markshannon: please review the changes made to this pull request. |
Sorry, something went wrong.
Sorry, something went wrong.
markshannon
left a comment
There was a problem hiding this comment.
Thanks for doing this. The code looks good now.
The speedup may not be as large as initially claimed due to addition checks for small integers, but it is a real speedup and should be further improved by #127620
Sorry, something went wrong.
|
It seems likely the thread sanitizer failure is not caused by this PR, but this PR does expose it. |
Sorry, something went wrong.
|
The TSan failure is fixed in main now, if you want to merge main back into the PR. |
Sorry, something went wrong.
⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️Hi! The buildbot iOS ARM64 Simulator 3.x has failed when building commit 5fc6bb2. What do you need to do:
You can take a look at the buildbot page here: https://buildbot.python.org/#/builders/1380/builds/2109 Summary of the results of the build (if available): == Click to see traceback logsTraceback (most recent call last):
|
Sorry, something went wrong.
We can add freelists for the int object to improve performance. Using the new methods from #121934 the amount of code needed for adding a freelist is quite small. We only implement the freelist for compact ints (e.g. a single digit). For multi-digit int objects adding freelists is more complex (we need a size-based freelist) and the gains are smaller (for very large int objects the allocation is not a significant part of the computation time)
Notes:
long_deallocand_PyLong_ExactDeallocare almost identical, we could keep justlong_deallocat the cost of a tiny bit of performance.Some references to discussions on freelists
The freelist improves performance of
intoperations in microbenchmarks:Benchmark script
On the pyperformance test suite (actually, a subset of the suite, not all benchmarks run on my system) shows the percentage of successfull freelist allocations increases significantly
Main:
PR