◐ Shell
reader mode source ↗
Skip to content

gh-151496: Use process groups in test_dtrace#151512

Merged
vstinner merged 3 commits into
python:mainfrom
vstinner:test_dtrace_pg
Jun 17, 2026
Merged

gh-151496: Use process groups in test_dtrace#151512
vstinner merged 3 commits into
python:mainfrom
vstinner:test_dtrace_pg

Conversation

@vstinner

@vstinner vstinner commented Jun 15, 2026

Copy link
Copy Markdown
Member

Create a new process group to run bpftrace commands, so it's possible to kill also child processes on timeout.

Create a new process group to run bpftrace commands, so it's possible
to kill also child processes on timeout.
@vstinner vstinner added the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label Jun 15, 2026
@bedevere-app bedevere-app Bot added the tests Tests in the Lib/test dir label Jun 15, 2026
@vstinner

Copy link
Copy Markdown
Member Author

@vstinner

Copy link
Copy Markdown
Member Author

@stratakis: I addressed your review, thanks. Please review the updated PR.

@vstinner

Copy link
Copy Markdown
Member Author

Tests / Sanitizers / TSan (free-threading) (pull_request): Failing after 22m

Hum. test_socket failed with a strange error. I just re-ran the job.

ERROR: testSourceAddress (test.test_socket.NetworkConnectionAttributesTest.testSourceAddress)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython/Lib/test/test_socket.py", line 442, in raise_queued_exception
    raise self.queue.get()
  File "/home/runner/work/cpython/cpython/Lib/test/test_socket.py", line 479, in clientRun
    test_func()
    ~~~~~~~~~^^
  File "/home/runner/work/cpython/cpython/Lib/test/test_socket.py", line 6048, in _testSourceAddress
    self.assertEqual(self.cli.getsockname()[1], self.source_port)
                     ~~~~~~~~~~~~~~~~~~~~^^
OSError: [Errno 9] Bad file descriptor

@stratakis

Copy link
Copy Markdown
Contributor

Nitpick: I'd change the helper name to something other as kill has a very specific meaning and the helper also closes the pipes, it doesn't only kill the processes. Maybe cleanup_process_group() or something?

For testing the assert_usable() path (10s timeout):

Fake bpftrace script:
(sleep 300) & sleep 300

And adding it to the $PATH when running the tests:
PATH="fake bpftrace path:$PATH" ./python -m test test_dtrace -m 'BPFTraceNormalTests' -v

This hangs and also leaves leaked children:
$ pgrep -af 'sleep 300'
194272 sleep 300

For testing the run_case() path (60s timeout):

Fake bpftrace script:

case "$*" in
  *'probe: success'*)
  echo 'probe: success'
  exit 0
  ;;
esac
(sleep 300) & sleep 300

PATH="fake bpftrace path:$PATH" ./python -m unittest -v test.test_dtrace.BPFTraceNormalTests.test_gc

After 60 secs it should failed and children are leaked:
$ pgrep -af 'sleep 300'
250104 sleep 300
250105 sleep 300

With the PR the assert_usable() path doesn't hang anymore and doesn't leak and the run_case() path still times outs but doesn't leak.

@vstinner

Copy link
Copy Markdown
Member Author

Tests / Sanitizers / TSan (free-threading) (pull_request): Cancelled after 60m

Oh, test_abc of ./python -m test --tsan-parallel --parallel-threads=4 -j4 -W seems to be stuck, and the job was cancelled after 60 minutes :-(

Run ./python -m test --tsan-parallel --parallel-threads=4 -j4 -W
Using random seed: 2657327358
0:00:00 load avg: 2.63 mem: 274.1 MiB Run 2 tests in parallel using 2 worker processes
0:00:13 load avg: 2.96 mem: 588.5 MiB [1/2] test_hashlib passed
0:00:43 load avg: 1.79 mem: 588.6 MiB running (1): test_abc (42.9 sec)
0:01:13 load avg: 1.08 mem: 589.1 MiB running (1): test_abc (1 min 13 sec)
0:01:43 load avg: 0.66 mem: 590.1 MiB running (1): test_abc (1 min 43 sec)
...
0:36:00 load avg: 0.07 mem: 531.8 MiB running (1): test_abc (35 min 59 sec)
0:36:30 load avg: 0.04 mem: 531.8 MiB running (1): test_abc (36 min 29 sec)
Error: The operation was canceled.

@vstinner

Copy link
Copy Markdown
Member Author

With the PR the assert_usable() path doesn't hang anymore and doesn't leak and the run_case() path still times outs but doesn't leak.

Oh thanks for testing manually! It's good that you managed to confirm that my fix works as expected :-)

Nitpick: I'd change the helper name to something other as kill has a very specific meaning and the helper also closes the pipes, it doesn't only kill the processes. Maybe cleanup_process_group() or something?

Calling proc.communicate() # Clean up in kill_process_group() is just there to call waitpid(pid) to avoid zombie process. Since the process and its child processes are killed, it should no longer have to wait for pipes to be closed, they should be already closed.

I prefer to keep "kill" in the function name, since it's the main action of the function.

Hide details View details @vstinner vstinner merged commit a064b32 into python:main Jun 17, 2026
165 of 171 checks passed
@vstinner vstinner deleted the test_dtrace_pg branch June 17, 2026 08:58
@miss-islington-app

Copy link
Copy Markdown

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.15.
🐍🍒⛏🤖

@bedevere-app

bedevere-app Bot commented Jun 17, 2026

Copy link
Copy Markdown

GH-151589 is a backport of this pull request to the 3.15 branch.

vstinner added a commit that referenced this pull request Jun 17, 2026
)

gh-151496: Use process groups in test_dtrace (GH-151512)

Create a new process group to run bpftrace commands, so it's possible
to kill also child processes on timeout.
(cherry picked from commit a064b32)

Co-authored-by: Victor Stinner <vstinner@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip news tests Tests in the Lib/test dir

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants