gh-87135: Hang non-main threads that attempt to acquire the GIL during finalization#28525
gh-87135: Hang non-main threads that attempt to acquire the GIL during finalization#28525jbms wants to merge 11 commits into
Conversation
|
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). CLA MissingOur records indicate the following people have not signed the CLA: For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. If you have recently signed the CLA, please wait at least one business day You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
Sorry, something went wrong.
|
Can this possibly go into a point release? I think it would be unfortunate for this bug to remain until 3.11, even though presumably it has been around since forever. |
Sorry, something went wrong.
270353e to
aadec13
Compare
September 23, 2021 01:16
The API deprecation must say 3.11. but the bugfix itself could be backported. We should let this one bake on the main branch for a bit first. |
Sorry, something went wrong.
aadec13 to
ff882a3
Compare
September 23, 2021 01:19
|
PS please do the CLA signing dance (see go/patching#cla internally, it's approved). |
Sorry, something went wrong.
|
I already signed the CLA earlier today, may take until tomorrow to register. |
Sorry, something went wrong.
|
I'm working on adding a mechanism that extension code can use to safely avoid threads hanging, and updating the documentation. I'll update this PR shortly once that is done. |
Sorry, something went wrong.
|
There are legit use cases for pthread_exit(). If PyThread_exit_thread() is deprecated, does it mean that developers using it should call directly |
Sorry, something went wrong.
vstinner
left a comment
There was a problem hiding this comment.
I dislike this PR. When Python is embedded in an application, it means that Python leaks hang threads in some cases, whereas currently these threads exit once they attempt to acquire the GIL.
I get that there are some use cases where calling pthread_exit() is bad, but other cases, calling pthread_exit() sounds less bad than hang these threads.
Would it be possible to make it possible to choose the behavior, and keep the current behavior by default?
Sorry, something went wrong.
ff882a3 to
16b4475
Compare
September 23, 2021 16:38
099dd01 to
8aab6c0
Compare
September 24, 2021 00:51
vstinner
left a comment
There was a problem hiding this comment.
This PR becomes more and more problems. I'm not sure if it is will fix more problems than introducing more problems.
I don't understand why Python should now start handling Windows GUI messages. Why only Windows GUI messages? Do you expect that other operating systems will also need special code in "hang threads"?
Sorry, something went wrong.
8aab6c0 to
d90f393
Compare
September 24, 2021 22:44
gpshead
left a comment
There was a problem hiding this comment.
leaving a few more review comments here. i'll followup on the bug with larger picture PR-existential strategy comments.
Sorry, something went wrong.
d90f393 to
8cf57ec
Compare
September 30, 2021 15:16
|
I have updated the documentation to document the new functions, and changed to use I have not yet made the change suggested by @gpshead to block signals in the pthread version --- I can do that if necessary, but I am not sure what the benefit is. Signal handlers already can't safely attempt to acquire the GIL or use any Python APIs that require the GIL, and therefore I would assume there is no harm in letting them run. Other than that, what are the next steps for this pull request? |
Sorry, something went wrong.
|
This PR is stale because it has been open for 30 days with no activity. |
Sorry, something went wrong.
Sorry, something went wrong.
gpshead
left a comment
There was a problem hiding this comment.
from your docs:
To avoid non-Python threads becoming blocked, or Python-created threads becoming
blocked while executing C extension code, you can use
:c:func:PyThread_TryAcquireFinalizeBlockand
:c:func:PyThread_ReleaseFinalizeBlock.
I think this phrasing undersells the magnitude of the problem and who can and should "fix" it... The "you can use" wording implies that it is the application user who debugged a hang is capable of resolving the problem by adjusting code they own.
But I suspect the reality is that until all Python C API using code in the world is updated to use these APIs (read: never/8-years)... hangs are going to be most frequently attributed to the huge pile of third party code that everybody's application uses. (Probably including a lot within the CPython internals itself).
There isn't really a solution for this. The existing GIL APIs and thus all existing Python C API using code in the world are not structured to check for and handle failure.
So it's mostly a matter of documentation semantics. We should seek to set expectations of who and when these new APIs should explicitly be used: In C code calling back into Python from potentially non-CPython created threads.
(I'd ignore the daemon thread aspect in our advice - anything could be one of those - we ultimately want to get rid of the daemon concept entirely and separately should document it as a "do not use / fundamentally broken / unfixable" feature in the threading module)
Other than the messaging, I think this overall PR looks good and want @Yhg1s to answer whether we backport to 3.12beta or not upon merge.
Sorry, something went wrong.
The finalized blocks is now protected by the main interpreters GIL.
Not sure exactly what the best change to the docs is --- but as I see it, the key message around the hangs is: "In most cases this is harmless" I think it is a rare case indeed where the hang will not be harmless. Aside from the use case of wanting to cleanly finalize and initialize multiple times in a program, cases where hanging is a problem would likely result in a hang or crash without this PR as well. |
Sorry, something went wrong.
Yhg1s
left a comment
There was a problem hiding this comment.
FWIW, I think we should get this into 3.12 (preferably before beta 2, which is scheduled next week). I think it's safe to do so because only an internal struct changes, and that struct is not used from any macros.
Sorry, something went wrong.
One example where it may be useful to acquire the finalize block separately from the GIL:
In any case having a single enum and |
Sorry, something went wrong.
ericsnowcurrently
left a comment
There was a problem hiding this comment.
There are a few things I haven't looked through (e.g. the details of how the "blocks" are implemented). Hopefully this is enough review to get things rolling though.
Sorry, something went wrong.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Sorry, something went wrong.
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
|
I have made the requested changes; please review again This PR can now be considered "on hold" or abandoned given the API questions regarding multiple interpreters. Please see the separate PR #105805 which contains just the minimal change to hang threads without adding any new public APIs. |
Sorry, something went wrong.
|
Thanks for making the requested changes! @gpshead, @ericsnowcurrently: please review the changes made to this pull request. |
Sorry, something went wrong.
|
It looks like this PR was replaced by #105805, which has been merged now. Is there anything to be done here? |
Sorry, something went wrong.
edited by AlexWaygood
LoadingUh oh!
There was an error while loading. Please reload this page.
Copy link Copy MarkdownSorry, something went wrong.
Uh oh!
There was an error while loading. Please reload this page.