bpo-37224: Fix test__xxsubinterpreters failure and GIL handling in interp_destroy() by aeros · Pull Request #16293 · python/cpython
Have you been able to reproduce the problem by running ./python -m test test__xxsubinterpreters -F -j30? Does the problem go away with this change applied?
That's been the most tricky part about this particular issue, I was having trouble replicating the test failure in the first place in my local environment (OS: Arch Linux, Kernel: Linux 5.3.11, CPU: Intel i5-4460):
$ ./python -m test test__xxsubinterpreters -j100 -F
...
== Tests result: INTERRUPTED ==
Test suite interrupted by signal SIGINT.
1000 tests OK.Perhaps I could try running the tests for a few hours instead of SIGINTing after 1k iterations or set up a FreeBSD virtual machine and run the tests again, to replicate the environment where the failure occurred as much as possible.
Also, I figured that you might have an idea of whether or not the proposed solution (in the PR or in the comments above) could help with the underlying issue. Admittedly this was a bit of a stab in the dark, which is why I said that I wasn't confident about the proposed solution. Looking back in retrospect, I think that I should have focused more on replicating the issue in the first place, but at the time (~Sep 20th) it had been my first attempt at working on a complex race condition.
I'm going to leave a comment on https://bugs.python.org/issue37224, so we can discuss it there.
Thanks for the review, I'll try to do some further investigation. I'll continue the discussion in the bpo issue.