bpo-34060: Report system load when running test suite for Windows#8357
bpo-34060: Report system load when running test suite for Windows#8357csabella merged 4 commits into
Conversation
ede0a5a to
da62440
Compare
July 20, 2018 20:40
zooba
left a comment
There was a problem hiding this comment.
I like this! And I'll be happy to have support for job objects in there too :)
Looking at the test runs, the numbers seem to be consistent with other platforms, but since I don't have as good a feel for what to expect here, I'd like someone else who's been involved in this or the previous PR to sign off as well.
Sorry, something went wrong.
Yeah I'd definitely like to get @vstinner's take on it since he is likely familiar with normal load values. |
Sorry, something went wrong.
Did I say thatI? A thread is fine here. regrtest already uses threads to run tests in subprocesses when using -jN. faulthandler also uses a C thread to implement an hard timeout (dumping the Python traceback on timeout). regrtest is full of threads :-) Overlapped IO may be more complicated than a thread, no? |
Sorry, something went wrong.
|
This is the failure that shows up when using a thread: |
Sorry, something went wrong.
fc8c05d to
63b57b2
Compare
July 21, 2018 12:35
|
So I think I jumped the gun early with the job grouping stuff. There was a much easier solution to dealing with interpreter crashes in This is now actually just pure python and consequently a lot simpler. |
Sorry, something went wrong.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be poked with soft cushions! |
Sorry, something went wrong.
6990d2c to
a51f181
Compare
July 25, 2018 02:38
|
I have made the requested changes; please review again |
Sorry, something went wrong.
|
It looks like there was a lot of interest and activity on this PR a few months ago and @zooba had approved it. @ammaraskar, could you resolve the merge conflict? I think that might be all that is needed, along with @vstinner's approval for merging. Thanks! |
Sorry, something went wrong.
It seems like my comments have been addressed.
|
I'm sorry but I don't have the bandwidth right to review this change (test it manually). @ammaraskar: You have to update your PR, there is now a conflict. @zware: If you are confident that the change is good, please go ahead and merge it (once CI tests and the conflict is solved). |
Sorry, something went wrong.
|
I haven't researched whether this is the best way to do this to any extent, but this looks fine to me. |
Sorry, something went wrong.
While Windows exposes the system processor queue length: the raw value used for load calculations on Unix systems, it does not provide an API to access the averaged value. Hence to calculate the load we must track and average it ourselves. We can't use multiprocessing or a thread to read it in the background while the tests run since using those would conflict with test_multiprocessing and test_xxsubprocess. Thus, we use Window's asynchronous IO API to run the tracker in the background with it sampling at the correct rate. When we wish to access the load we check to see if there's new data on the stream, if there is, we update our load values.
fb0a14d to
31a71df
Compare
February 9, 2019 22:44
31a71df to
a8df864
Compare
February 10, 2019 02:13
* Clean up code which checked presence of os.{stat,lstat,chmod} (GH-11643)
(cherry picked from commit 8377cd4)
* bpo-36725: regrtest: add TestResult type (GH-12960)
* Add TestResult and MultiprocessResult types to ensure that results
always have the same fields.
* runtest() now handles KeyboardInterrupt
* accumulate_result() and format_test_result() now takes a TestResult
* cleanup_test_droppings() is now called by runtest() and mark the
test as ENV_CHANGED if the test leaks support.TESTFN file.
* runtest() now includes code "around" the test in the test timing
* Add print_warning() in test.libregrtest.utils to standardize how
libregrtest logs warnings to ease parsing the test output.
* support.unload() is now called with abstest rather than test_name
* Rename 'test' variable/parameter to 'test_name'
* dash_R(): remove unused the_module parameter
* Remove unused imports
(cherry picked from commit 4d29983)
* bpo-36725: Refactor regrtest multiprocessing code (GH-12961)
Rewrite run_tests_multiprocess() function as a new MultiprocessRunner
class with multiple methods to better report errors and stop
immediately when needed.
Changes:
* Worker processes are now killed immediately if tests are
interrupted or if a test does crash (CHILD_ERROR): worker
processes are killed.
* Rewrite how errors in a worker thread are reported to
the main thread. No longer ignore BaseException or parsing errors
silently.
* Remove 'finished' variable: use worker.is_alive() instead
* Always compute omitted tests. Add Regrtest.get_executed() method.
(cherry picked from commit 3cde440)
* bpo-36719: regrtest always detect uncollectable objects (GH-12951)
regrtest now always detects uncollectable objects. Previously, the
check was only enabled by --findleaks. The check now also works with
-jN/--multiprocess N.
--findleaks becomes a deprecated alias to --fail-env-changed.
(cherry picked from commit 75120d2)
* bpo-34060: Report system load when running test suite for Windows (GH-8357)
While Windows exposes the system processor queue length, the raw value
used for load calculations on Unix systems, it does not provide an API
to access the averaged value. Hence to calculate the load we must track
and average it ourselves. We can't use multiprocessing or a thread to
read it in the background while the tests run since using those would
conflict with test_multiprocessing and test_xxsubprocess.
Thus, we use Window's asynchronous IO API to run the tracker in the
background with it sampling at the correct rate. When we wish to access
the load we check to see if there's new data on the stream, if there is,
we update our load values.
(cherry picked from commit e16467a)
* bpo-36719: Fix regrtest re-run (GH-12964)
Properly handle a test which fail but then pass.
Add test_rerun_success() unit test.
(cherry picked from commit 837acc1)
* bpo-36719: regrtest closes explicitly WindowsLoadTracker (GH-12965)
Regrtest.finalize() now closes explicitly the WindowsLoadTracker
instance.
(cherry picked from commit 00db7c7)
This is (mostly) a pure Python implementation of the other PR. It leverages the
typeperfcommand which monitors performance counters and outputs them at a given interval. So every 5 seconds,typeperfcan output the processor queue length into stdout.subprocess.stdout.readlineis a blocking call. Using a thread seemed like an obvious solution, but we can't achieve this with multiprocessing or a thread, because like Victor speculated in the previous bug report on this, it conflicts withtest_multiprocessingandtest_threading. Hence, I opted to use the asynchronous/overlapped IO API which was designed forasync. Most of the diff actually just pertains to using this rather low level API.This is almost a pure python implementation but there was one edge case where this would fail. Namely, when the python interpreter running the test suite crashes, this leaves an orphaned
typeperfprocess running which refuses to die. This means that when the test suite is run with-j xand this situation happens:The big test coordinating python process will wait forever on the crashed python and consequently
typeperfto terminate, which just doesn't happen by default in Windows. After reading up on the APIs, the right way to fix this is by using a Job Object to ask the OS to kill the child when the parent dies. Hence, there is a change in_winapito make this happen. Unlike the last PR, this API is actually reusable and fit to be exposed to the public. It could even allow implementing things like bpo-5115 to be a lot easier.https://bugs.python.org/issue34060