bpo-6721: Sanitize logging locks while forking#4071
Conversation
8d51698 to
caa3e6f
Compare
September 13, 2018 08:41
caa3e6f to
a0c2cfb
Compare
September 13, 2018 17:03
Now it fails on the previous code and is fixed by the logging changes. The test now uses a thread to hold the locks during the fork in a synchronized manner. Due to mixing of fork and a thread there is potential for deadlock in the child process. buildbots and time will tell if this actually manifests itself in this test or not. :/
|
A regression test that actually catches the bug and is fixed by this change has been added. Due to mixing of fork and a thread there is potential for deadlock |
Sorry, something went wrong.
|
Thanks @gpshead for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7. |
Sorry, something went wrong.
bpo-6721: When os.fork() was called while another thread holds a logging lock, the child process may deadlock when it tries to log. This fixes that by acquiring all logging locks before fork and releasing them afterwards. A regression test that fails before this change is included. Within the new unittest itself: There is a small _potential_ due to mixing of fork and a thread in the child process if the parent's thread happened to hold a non-reentrant library call lock (malloc?) when the os.fork() happens. buildbots and time will tell if this actually manifests itself in this test or not. :/ A functionality test that avoids that would be a challenge. An alternate test that isn't trying to produce the deadlock itself but just checking that the release and acquire calls are made would be the next best alternative if so. (cherry picked from commit 1900384) Co-authored-by: Gregory P. Smith <greg@krypto.org>
bpo-6721: When os.fork() was called while another thread holds a logging lock, the child process may deadlock when it tries to log. This fixes that by acquiring all logging locks before fork and releasing them afterwards. A regression test that fails before this change is included. Within the new unittest itself: There is a small _potential_ due to mixing of fork and a thread in the child process if the parent's thread happened to hold a non-reentrant library call lock (malloc?) when the os.fork() happens. buildbots and time will tell if this actually manifests itself in this test or not. :/ A functionality test that avoids that would be a challenge. An alternate test that isn't trying to produce the deadlock itself but just checking that the release and acquire calls are made would be the next best alternative if so. (cherry picked from commit 1900384) Co-authored-by: Gregory P. Smith <greg@krypto.org> [Google]
|
Just a heads up: We have reverted this commit in Fedora's Python package (version 3.7.1), because it rendered our graphical installers not bootable. This may have just exposed some bug in the installer or it may have introduced a regression. We are currently investigating the problem with Fedora QA and the installer developers and we will open a BPO issue if we find out a simple reproducer. Details: https://bugzilla.redhat.com/show_bug.cgi?id=1644936 |
Sorry, something went wrong.
This part is simple, acquire and release the logging lock and each logging handler's lock around fork.
A unittest is still needed (relatively trivial to create).
The
logging.handlersmodule has aQueueHandlerandQueueListenerwhich uses thequeuemodule... that is full of locks. I'm not addressing that as I don't see an immediately obvious correct thing to do with that queue. A QueueListener thread won't be running in the forked child anyways so the right thing for anyone using one of those is likely to remove the QueueHandler from their logging config entirely... unless they go on to re-establish their complicated queue based logging handler setup in the child. suggestion: Do nothing. Let people using QueueHandler deal with their own problems as their application sees fit.https://bugs.python.org/issue6721