gh-107219: Fix concurrent.futures terminate_broken()#109244
Conversation
|
@serhiy-storchaka @methane @ambv @gpshead @pitrou: Would you mind to have a look? I would like to merge this fix as soon as possible since the bug #107219 is affecting very badly the Python workflow. The CI failure rate is very high because of this For now, I prefer to use |
Sorry, something went wrong.
|
With this change, I can no longer reproduce bug. On my Windows VM which has 2 CPUs, I can easily reproduce the hang in around 30 seconds on the Python main branch:
I stressed the test with:
In 8 minutes, I failed to reproduce the bug anymore with this change. Bonus: Moreover, I can no longer hang the test when I interrupt it with CTRL+C. |
Sorry, something went wrong.
Oh! For the first time in like 2 weeks, Note: There are only these two unrelated failures: These 2 tests passed when re-run in verbose mode (Result: FAILURE then SUCCESS). |
Sorry, something went wrong.
serhiy-storchaka
left a comment
There was a problem hiding this comment.
LGTM.
But I have one suggestion and one question/suggestion.
Sorry, something went wrong.
Fix a race condition in concurrent.futures. When a process in the process pool was terminated abruptly (while the future was running or pending), close the connection write end. If the call queue is blocked on sending bytes to a worker process, closing the connection write end interrupts the send, so the queue can be closed. Changes: * _ExecutorManagerThread.terminate_broken() now closes call_queue._writer. * multiprocessing PipeConnection.close() now interrupts WaitForMultipleObjects() in _send_bytes() by cancelling the overlapped operation.
Address Serhiy's review.
9987dc7 to
069fbfa
Compare
September 11, 2023 07:47
|
Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11, 3.12. |
Sorry, something went wrong.
|
There's a new commit after the PR has been approved. @serhiy-storchaka: please review the changes made to this pull request. |
Sorry, something went wrong.
…109244) Fix a race condition in concurrent.futures. When a process in the process pool was terminated abruptly (while the future was running or pending), close the connection write end. If the call queue is blocked on sending bytes to a worker process, closing the connection write end interrupts the send, so the queue can be closed. Changes: * _ExecutorManagerThread.terminate_broken() now closes call_queue._writer. * multiprocessing PipeConnection.close() now interrupts WaitForMultipleObjects() in _send_bytes() by cancelling the overlapped operation. (cherry picked from commit a9b1f84) Co-authored-by: Victor Stinner <vstinner@python.org>
|
PR merged, thanks for the review @serhiy-storchaka. I wanted to merge this fix ASAP since it prevented to merge others PRs. |
Sorry, something went wrong.
…109244) Fix a race condition in concurrent.futures. When a process in the process pool was terminated abruptly (while the future was running or pending), close the connection write end. If the call queue is blocked on sending bytes to a worker process, closing the connection write end interrupts the send, so the queue can be closed. Changes: * _ExecutorManagerThread.terminate_broken() now closes call_queue._writer. * multiprocessing PipeConnection.close() now interrupts WaitForMultipleObjects() in _send_bytes() by cancelling the overlapped operation. (cherry picked from commit a9b1f84) Co-authored-by: Victor Stinner <vstinner@python.org>
serhiy-storchaka
left a comment
There was a problem hiding this comment.
According to the sources of GetOverlappedResult() in _winapi.c, the only value of err can be ERROR_SUCCESS (0), ERROR_MORE_DATA, ERROR_OPERATION_ABORTED, ERROR_IO_INCOMPLETE.
Sorry, something went wrong.
… (#109255) gh-107219: Fix concurrent.futures terminate_broken() (GH-109244) Fix a race condition in concurrent.futures. When a process in the process pool was terminated abruptly (while the future was running or pending), close the connection write end. If the call queue is blocked on sending bytes to a worker process, closing the connection write end interrupts the send, so the queue can be closed. Changes: * _ExecutorManagerThread.terminate_broken() now closes call_queue._writer. * multiprocessing PipeConnection.close() now interrupts WaitForMultipleObjects() in _send_bytes() by cancelling the overlapped operation. (cherry picked from commit a9b1f84) Co-authored-by: Victor Stinner <vstinner@python.org>
Well, if you're confident, you can modify the By the way, having
Thanks. |
Sorry, something went wrong.
… (#109254) gh-107219: Fix concurrent.futures terminate_broken() (GH-109244) Fix a race condition in concurrent.futures. When a process in the process pool was terminated abruptly (while the future was running or pending), close the connection write end. If the call queue is blocked on sending bytes to a worker process, closing the connection write end interrupts the send, so the queue can be closed. Changes: * _ExecutorManagerThread.terminate_broken() now closes call_queue._writer. * multiprocessing PipeConnection.close() now interrupts WaitForMultipleObjects() in _send_bytes() by cancelling the overlapped operation. (cherry picked from commit a9b1f84) Co-authored-by: Victor Stinner <vstinner@python.org>
Fix a race condition in concurrent.futures. When a process in the process pool was terminated abruptly (while the future was running or pending), close the connection write end. If the call queue is blocked on sending bytes to a worker process, closing the connection write end interrupts the send, so the queue can be closed.
Changes: