bpo-37802: Slightly improve perfomance of PyLong_FromSize_t()#15192
Conversation
mdickinson
left a comment
There was a problem hiding this comment.
LGTM
Sorry, something went wrong.
|
BTW, I think the behavior of That's certainly not a reason not to merge this, though. I'll happily rebase my changes on top of this once it's in. |
Sorry, something went wrong.
(Posted; see https://bugs.python.org/issue37812 and #15203 .) |
Sorry, something went wrong.
0ba7b3c to
9c52079
Compare
August 24, 2019 18:11
gnprice
left a comment
There was a problem hiding this comment.
This looks good to me... except it's not ambitious enough! 🙂 Details below.
(Happily the more ambitious version is only a line or two more.)
Sorry, something went wrong.
|
@mdickinson anything should be done to get this merged? |
Sorry, something went wrong.
9c52079 to
b4c286f
Compare
August 25, 2019 08:35
|
I decided to deduplicate |
Sorry, something went wrong.
gnprice
left a comment
There was a problem hiding this comment.
Neat! This simplifies the code and it's faster (based on the measurements you posted in the issue thread) -- that's a nice combination. 😄
I'd be glad to see the signed versions get a similar treatment. (As a separate PR.) Might not deliver a performance win at the same level, since your hypothesis is that the speedup here is from cutting out the sign logic...
but it should be performance-neutral at worst, and it's plausible it could be an improvement by generating less code so it's friendlier to the cache. Also could speed things up just as a natural consequence of deduplicating the code, by making a single place to apply all the optimizations we have in one function or another, plus any new ones you think of while staring at it.
(And even if it does turn out only performance-neutral: I think opinions will vary, but for code deduplication at this scale I'd certainly be in favor.)
Sorry, something went wrong.
|
Wise words!
Yet I think for doc-comments the risk is worth it. If the comment simply
describes the interface of the function, i.e. the things a caller needs to
know about it... then if that ever needs an update, there will be much
bigger burdens than updating the comment. Especially for functions in the
public API like these.
But that is why I would like the comments to describe the interface
directly (like they do in master), rather than have a dependency on
something more fluid like the arrangement of this code. :-)
…On Thu, Aug 29, 2019, 11:16 Sergey Fedoseev ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In Objects/longobject.c
<#15192 (comment)>:
> @@ -410,6 +414,30 @@ PyLong_FromUnsignedLong(unsigned long ival)
return (PyObject *)v;
}
+/* Same as above, but for unsigned long. */
Non-existing comments never lie and don't require updates 🙂 .
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#15192?email_source=notifications&email_token=AAAG4DJKKLGMQMQG4ULR2JTQHAG7DA5CNFSM4IKUBGHKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCDEW34I#discussion_r319206328>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAG4DMBSVF5OI5LF3HEMEDQHAG7DANCNFSM4IKUBGHA>
.
|
Sorry, something went wrong.
|
The current approach using inline function have the same problems as here (https://bugs.python.org/issue38015). I'll rework it when #15718 will be merged. |
Sorry, something went wrong.
Hmm, you measured this as being a significant speedup in your microbenchmark, right? That hardly seems like a problem 😉 even if there's a way to make the microbenchmark go even faster still. I like this PR as it is 🙂 and I'd be glad to see it merged. It actually simplifies the code at the same time as optimizing it, so there's no need to weigh one value against another. As I commented over on #15718, turning If that is your plan, then it's also basically additive upon the change in this PR. So I think it'd still be good to merge this PR, even if a subsequent PR goes on to turn the function into a macro. |
Sorry, something went wrong.
It induces the same problems as your is_small_int() function on 32-bit platforms. |
Sorry, something went wrong.
Right but that "problem" is that a few extra instructions are emitted, right? And so the microbenchmark is a few percent slower than it would be. If the microbenchmark is nevertheless faster than before, then this is still an optimization. |
Sorry, something went wrong.
https://bugs.python.org/msg351052 I leave it to you to benchmark it on 32-bit platform. |
Sorry, something went wrong.
Hmm -- do you mean you haven't tried it on a 32-bit platform? Because you said that it had a problem there, I assumed that meant you'd seen something empirically. If you have, I think it would be helpful to say what results you got. |
Sorry, something went wrong.
|
I mean I checked how it compiles on godbolt.org and that was sufficient to me to not benchmark it. |
Sorry, something went wrong.
Cool, got it. Well -- as I said upthread, if this PR were to end up making something as big as If you can find a way to get all the desired speedups without that kind of complexity in the code, that'd be great. As I see it, I would also be glad to see a PR version merged that did something like
because that's good for the source code and no worse than a wash on performance. (Note also that anyone who's working to squeeze out the last drop of performance in running their code is much more likely to be using a 64-bit platform already.) |
Sorry, something went wrong.
Agreed, I don't think anyone using a 32-bit platform is going to be overly concerned about a 5% drop in performance. A more significant performance loss may be an issue, but not when it's that small. IMO, an equivalent or larger increase in the performance for 64-bit outweighs an equal or smaller loss for 32-bit (within a reasonable amount). |
Sorry, something went wrong.
I agree that'd be great to write some generic function, but C doesn't allow to write type generic functions and using inline function here induces these unwanted type casts that makes emitted code less efficient in unpredictable way. Such inline function creates a false feeling that it works like type generic function (as it was with There are platforms besides x86 and I don't want to spend time assessing how inline function degrades performance on each of these platforms, and you? (update):
Where did you get this ~5%? Did you run benchmark? |
Sorry, something went wrong.
|
Here's inline function vs macro demo: https://godbolt.org/z/K5tbF2.
|
Sorry, something went wrong.
0d33a8e to
d0fdead
Compare
September 9, 2019 10:19
Do you mean x86 (32 bits) or x32 (64-bit integers, but use 32 bit pointers)? |
Sorry, something went wrong.
(That's from here: Just as you quoted me saying: it's what you found for |
Sorry, something went wrong.
Consider this the other way around: I don't think it's a good use of time to investigate what small unnecessary bits of work each compiler on each platform emits, or going to great lengths in our code to coax them into doing a bit better. (Remember that there's absolutely nothing in the language that means a compiler can't emit the exact same code in the static-function case as you're seeing in the macro case -- it's purely a matter of how smart the compiler gets. See e.g. this demo: #15216 (comment) ) That can be well worth it when the payoff is large -- when there's an opportunity for a significant win on ordinary Python code. (Or as a good proxy for that: a large win on code that shows up in profiles of ordinary Python code.) But many-line macros and other code complications have a real cost, and the payoff has to be worth the cost. Compilers will emit less-than-optimal code. That's a fact of life, no matter what we do to our code. We have to keep going anyway and write the code in a way that helps it make sense to each other as humans. When the compiler emits code that's much slower than it could be, in a place where that significantly matters, that's one thing... but if we insisted on squeezing the last cycle out of every function, we'd never get anything else done. |
Sorry, something went wrong.
This 5% of difference is the cost of 2 extra instructions on x86_64, and if you check demo, you can see that in this case there is more substantial difference (53 vs 38 instructions and some of them are in loop). |
Sorry, something went wrong.
This code is trivial, making it a macro doesn't make it less trivial. It makes it less "beautiful", but this code was modified once in the last 10 years, so I doubt there are many people interested in reading it.
My PR provides speed-up by eliminating these lines: Lines 390 to 391 in c59295a I guess the person who wrote these lines many years ago was thinking the same way as you, years have passed, but compilers are still not so smart. |
Sorry, something went wrong.
vstinner
left a comment
There was a problem hiding this comment.
LGTM, but I have one last request.
Sorry, something went wrong.
d0fdead to
62d3ba8
Compare
September 10, 2019 18:16
https://bugs.python.org/issue37802