gh-102509: Start initializing `ob_digit` of `_PyLongValue` by illia-v · Pull Request #102510 · python/cpython
@illia-v Can you provide Python code that reproduces the access of the uninitialized value? The theory is that it shouldn't matter that this digit is uninitialized when ob_size = 0, since it should never be accessed. I think the right fix is to find out where we're breaking that assumption.
I'm not convinced that _PyLong_New is the right place to fix this.
I think I see. In the case ob_size=0, we are reading a potentially uninitialised value ob_digit[0], in medium_value, but we then always multiply it by zero. I admit this does make me a bit twitchy, but I think it's fine in practice. Reading uninitialised memory is not undefined behaviour: we'll just get a random integer. (In theory we could get a trap representation, but I don't think trap representations have met real machines running CPython in some number of decades, and we already assume elsewhere that we're working with integer types with no trap representations.) We then multiply that random integer by zero, which will always give zero.
So I think this counts as a false positive, and no fix is necessary.
Pinging @markshannon, who introduced this logic: Mark, I don't see any issue here, and think this should be closed. Agreed?
It may still be worth adding a comment in medium_value to the effect that yes, we know this is accessing allocated but potentially uninitialized memory, and that's okay.
I admit this does make me a bit twitchy [...]
To be clear, if this weren't a hot path then I'd probably add the extra initialization just for peace of mind, but _PyLong_New is a hot path, where we don't want to be doing any more work than strictly necessary.
This does seem to be a real bug.
When working on #102464, I added an assert that zero sized ints had ob_digit[0] == 0. The assertion failed and we never set ob_digit[0] to non-zero if ob_size == 0.
Reading uninitialised memory is not undefined behaviour
Are you sure about this?
The real issue is whether C guarantees that 0 * unitialized_value == 0 and I'm not sure that is does.
It probably should, but you know much C likes undefined behavior 🙁
I expect this code to go when #101291 is implemented, so we might as well fix it for now.
Are you sure about this?
Fairly sure, yes. :-) We either get an uninitialised value (which is fine), or a trap representation (which is not), but trap representations in integers are not a practical problem these days.
Here's a good Stack Overflow answer by Eric Postpischil that quotes chapter and verse: https://stackoverflow.com/a/66840190/270986
The real issue is whether C guarantees that
0 * unitialized_value == 0and I'm not sure that is does.
It should be fine - at that point, the uninitialised value is some value of type digit. And every single legal value of type digit, when multiplied by zero, gives zero.
I think I see. In the case
ob_size=0, we are reading a potentially uninitialised valueob_digit[0], inmedium_value, but we then always multiply it by zero.
I am not very familiar with this part of CPython source, but comments and code itself mention not only ob_size=0 but also -1 and 1, are not they?
| /* Is this PyLong of size 1, 0 or -1? */ | |
| #define IS_MEDIUM_VALUE(x) (((size_t)Py_SIZE(x)) + 1U < 3U) | |
| /* convert a PyLong of size 1, 0 or -1 to a C integer */ | |
| static inline stwodigits | |
| medium_value(PyLongObject *x) | |
| { | |
| assert(IS_MEDIUM_VALUE(x)); | |
| return ((stwodigits)Py_SIZE(x)) * x->long_value.ob_digit[0]; | |
| } |
I am not very familiar with this part of CPython source, but comments and code itself mention not only
ob_size=0but also -1 and 1, are not they?
Yes, but if ob_size == 1 or ob_size == -1 then we're guaranteed that ob_digit[0] has been initialized. It's only in the case ob_size == 0 when we can end up with an uninitialized digit.
If this goes in, we should have a comment explaining why the extra initialization is necessary (possibly with a pointer to this PR or the associated issue), and this comment in longintrepr.h needs updating, too. (I added that comment previously, having already noticed the uninitialised access and figured out that it was safe; I'd assumed that @markshannon had done this intentionally.)
My preferred solution here would be to go back to a more obviously correct spelling for medium_value that only accesses ob_digit[0] when ob_size is nonzero.
We expect the internals of PyLongObject to change again, which may make this change obsolete for 3.13 (but worth doing anyway).
The question is "should we backport this to 3.12?"
@Yhg1s
Thanks @illia-v for the PR, and @markshannon for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
…honGH-102510) (cherry picked from commit fc130c4) Co-authored-by: Illia Volochii <illia.volochii@gmail.com>