bpo-44032: Defer clearing last thread state until last GC has been run.#26285
Conversation
|
@erlend-aasland Can you confirm that this fixes the issue? |
Sorry, something went wrong.
vstinner
left a comment
There was a problem hiding this comment.
A thread state should remain more or less usable after PyInterpreterState_Clear() is called. I don't think that this PR fix the root issue of https://bugs.python.org/issue44032#msg394104
IMO it would be safer to only release datastack_chunk memory in tstate_delete_common(), rather than in PyThreadState_Clear().
This change remains interesting ;-)
Sorry, something went wrong.
I'll do as soon as I get back on my computer. UPDATE: This PR does not fix the issue. I've used @pablogsal's reproducer from #26274 (comment). Here's an excerpt of the dump (complete dump attached): |
Sorry, something went wrong.
|
@vstinner I don't see how a thread state can remain usable after the interpreter has been cleared. |
Sorry, something went wrong.
FYI, freeing datastack chunks in |
Sorry, something went wrong.
|
🤖 New build scheduled with the buildbot fleet by @markshannon for commit ddad17e 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
Sorry, something went wrong.
vstinner
left a comment
There was a problem hiding this comment.
LGTM.
Calling _PyObject_VirtualFree() in tstate_delete_common() sounds like the good thing to do. I doesn't mean that a tstate must remain usable after calling PyThreadState_Clear(state), it's just about consistency.
Sorry, something went wrong.
test_asyncio hangs, unrelated error: https://bugs.python.org/issue44112 |
Sorry, something went wrong.
|
I'm not worried about asyncio, but test_multiprocessing also fails for one and test_posix for another. There doesn't seem to a coherent model of how threads and interpreters are related, or what memory belongs to the interpreter or thread. |
Sorry, something went wrong.
Sounds like an issue should be opened for this, if there's not already one on bpo. |
Sorry, something went wrong.
|
AMD64 RHEL8 Refleaks PR: I can be a random error which doesn't fail with re-run in verbose mode, we don't know since the job hangs before test_posix can be re-run. |
Sorry, something went wrong.
I fail to see the test_multiprocessing error. multiprocessing tests are full of race conditions. Some are legit bugs, some are bugs in the tests. I fixed tons of bugs in multiprocessing and in its test suite last years, but if you search for "multiprocessing" in the bug tracker, you will see a long list of issues open for several weeks/months. IMO it's perfectly safe to merge this PR. All issues that you saw are very likely known (not regression of your change). Clearing the current thread state would be another can of worm, but your PR no longer does that ;-) |
Sorry, something went wrong.
|
Ok, I'll merge this then. |
Sorry, something went wrong.
|
Yeah, it's fine, in the really worst case, we can just revert the change to revisit the options. But again, IMO this function is perfectly safe. |
Sorry, something went wrong.
https://bugs.python.org/issue44032