Issue 37127: Handling pending calls during runtime finalization may cause problems.
Created on 2019-06-01 19:11 by eric.snow, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 19061 | merged | vstinner, 2020-03-18 18:03 | |
| PR 19439 | merged | vstinner, 2020-04-08 17:12 | |
| Messages (12) | |||
|---|---|---|---|
| msg344202 - (view) | Author: Eric Snow (eric.snow) * ![]() |
Date: 2019-06-01 19:11 | |
In Python/lifecycle.c (Py_FinalizeEx) we call _Py_FinishPendingCalls(), right after we stop all non-daemon Python threads but before we've actually started finalizing the runtime state. That call looks for any remaining pending calls (for the main interpreter) and runs them. There's some evidence of a bug there. In bpo-33608 I moved the pending calls to per-interpreter state. We saw failures (sometimes sporadic) on a few buildbots (e.g. FreeBSD) during runtime finalization. However, nearly all of the buildbots were fine, so it may be a question of architecture or slow hardware. See bpo-33608 for details on the failures. There are a number of possibilities, but it's been tricky reproducing the problem in order to investigate. Here are some theories: * daemon threads (a known weak point in runtime finalization) block pending calls from happening until some time after portions of the runtime have already been cleaned up * there's a race that causes the pending calls machinery to get caught in some sort infinite loop (e.g. a pending call fails and gets re-queued) * a corner case in the pending calls logic that triggers only during finalization Here are some other points to consider: * do we have the same problem during subinterpreter finalization (Py_EndInterpreter() rather than runtime finalization)? * perhaps the problem extends beyond finalization, but the conditions are more likely there * the change for bpo-33608 could have introduced the bug rather that exposing an existing one |
|||
| msg344207 - (view) | Author: Eric Snow (eric.snow) * ![]() |
Date: 2019-06-01 19:21 | |
Also, someone did manage to investigate and identify a likely cause: https://bugs.python.org/issue33608#msg342791 |
|||
| msg364489 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-03-18 00:42 | |
In the master branch of Python, trip_signal() calls _PyEval_AddPendingCall(tstate) and tstate is get using _PyRuntimeState_GetThreadState(runtime).
trip_signal() can be called while the GIL is not held: tstate is NULL in this case. For example, it's the case when calling signal.raise_signal().
Problem: _PyEval_AddPendingCall() uses tstate if pending->finishing is non-zero.
if (pending->finishing) {
...
_PyErr_Fetch(tstate, &exc, &val, &tb);
_PyErr_SetString(tstate, PyExc_SystemError,
"Py_AddPendingCall: cannot add pending calls "
"(Python shutting down)");
...
}
pending->finishing was addd in bpo-33608 by the change:
commit 842a2f07f2f08a935ef470bfdaeef40f87490cfc
Author: Eric Snow <ericsnowcurrently@gmail.com>
Date: Fri Mar 15 15:47:51 2019 -0600
bpo-33608: Deal with pending calls relative to runtime shutdown. (gh-12246)
I found this issue while trying to make pending calls per interpreter in bpo-39984: move pending from _PyRuntimeState to PyInterpreterState.
|
|||
| msg364490 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-03-18 00:48 | |
> In bpo-33608 I moved the pending calls to per-interpreter state. We saw failures (sometimes sporadic) on a few buildbots (e.g. FreeBSD) during runtime finalization. However, nearly all of the buildbots were fine, so it may be a question of architecture or slow hardware. See bpo-33608 for details on the failures. That was a crash in multiprocessing tests. PyEval_RestoreThread(tstate) was called with a freed PyThreadState: tstate->interp = 0xdbdbdbdbdbdbdbdb for example. I recently fixed PyEval_RestoreThread() in bpo-39877 to exit properly daemon theads without dereferencing tstate which points to freed memory. To reproduce the bug, it helped me to add a sleep at Python exit. Fixed value (ex: 1 second) or better: random sleep (between 1 ms and 1 sec). |
|||
| msg364543 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-03-18 18:10 | |
> pending->finishing was addd in bpo-33608 by the change: commit 842a2f07f2f08a935ef470bfdaeef40f87490cfc Since this change, Py_AddPendingCall() now requires the thread to have a Python thread state: it is used if pending->finishing is non-zero. The function documentation says: "This function doesn’t need a current thread state to run, and it doesn’t need the global interpreter lock." https://docs.python.org/dev/c-api/init.html#c.Py_AddPendingCall The current implementation seems to contradict the documentation. We may update the documentation to replace "doesn't need" with "requires" (a Python thread state). Or the implementation can use PyGILState_Ensure() and PyGILState_Release() to create a temporary Python thread state if the thread doesn't have one. Removing pending->finishing would only partially fix the issue. With PR 19061, tstate is required to access "ceval" structure. trip_signal() called by the C signal handler has a similar issue: it only requires non-NULL tstate if pending->finishing is non-zero. |
|||
| msg364544 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-03-18 18:21 | |
trip_signal() has another problem. If pending->finishing is non-zero, _PyEval_AddPendingCall() uses the C API whereas the current thread may not hold the GIL. That's forbidden by the C API. The more I think about it, the more I think that pending->finishing is fragile and should be removed. I understood that pending->finishing was added to workaround crashes during Python finalization. I fixed multiple crashes related to daemon threads during Python finalization in bpo-39877. Maybe the bugs that I fixed were the ones which pending->finishing was supposed to work around. |
|||
| msg364545 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-03-18 18:21 | |
My notes on Python finalization: https://pythondev.readthedocs.io/finalization.html |
|||
| msg364550 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-03-18 18:28 | |
New changeset 8849e5962ba481d5d414b3467a256aba2134b4da by Victor Stinner in branch 'master': bpo-39984: trip_signal() uses PyGILState_GetThisThreadState() (GH-19061) https://github.com/python/cpython/commit/8849e5962ba481d5d414b3467a256aba2134b4da |
|||
| msg365997 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-04-08 17:44 | |
See also bpo-40082: trip_signal() gets NULL tstate on Windows on CTRL+C. |
|||
| msg366004 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-04-08 20:10 | |
New changeset cfc3c2f8b34d3864717ab584c5b6c260014ba55a by Victor Stinner in branch 'master': bpo-37127: Remove _pending_calls.finishing (GH-19439) https://github.com/python/cpython/commit/cfc3c2f8b34d3864717ab584c5b6c260014ba55a |
|||
| msg366007 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-04-08 21:18 | |
I removed _pending_calls.finishing for multiple reasons: * _PyEval_AddPendingCall() used the C API whereas the caller may not hold the GIL. * bpo-40082: trip_signal() can be called from a thread which has no Python thread state. On Windows, CTRL+C calls trip_signal() in a new thread a each call. I rewrote trip_signal() to only use the PyInterpreterState ("interp") and avoid PyThreadState ("tstate") in PR 19441 to fix bpo-40082. trip_signal() should read and set atomtic variables: don't modify globals without a lock. _PyEval_AddPendingCall() is not fully async-signal safe yet :-/ Using a lock is unsafe in a signal handler. |
|||
| msg366008 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-04-08 21:19 | |
> Also, someone did manage to investigate and identify a likely cause: > https://bugs.python.org/issue33608#msg342791 I made many changes in Python internals since bpo-33608 was reported. I am not aware of any recent issue with pending calls and so close the issue. If you get new crashes/hangs with pending calls during Python finalization, please open a new issue. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:16 | admin | set | github: 81308 |
| 2020-04-08 21:19:44 | vstinner | set | status: open -> closed resolution: fixed messages: + msg366008 stage: patch review -> resolved |
| 2020-04-08 21:18:03 | vstinner | set | messages: + msg366007 |
| 2020-04-08 20:10:59 | vstinner | set | messages: + msg366004 |
| 2020-04-08 17:44:08 | vstinner | set | messages: + msg365997 |
| 2020-04-08 17:12:53 | vstinner | set | pull_requests: + pull_request18793 |
| 2020-03-18 18:28:57 | vstinner | set | messages: + msg364550 |
| 2020-03-18 18:21:42 | vstinner | set | messages: + msg364545 |
| 2020-03-18 18:21:15 | vstinner | set | messages: + msg364544 |
| 2020-03-18 18:10:40 | vstinner | set | messages: + msg364543 |
| 2020-03-18 18:03:56 | vstinner | set | keywords:
+ patch stage: test needed -> patch review pull_requests: + pull_request18415 |
| 2020-03-18 00:48:47 | vstinner | set | messages: + msg364490 |
| 2020-03-18 00:42:09 | vstinner | set | messages: + msg364489 |
| 2020-03-09 23:40:31 | vstinner | set | nosy:
+ vstinner |
| 2019-06-01 19:21:38 | eric.snow | set | messages: + msg344207 |
| 2019-06-01 19:11:46 | eric.snow | create | |
