test: fix races in test-performance-eventlooputil#36028
Conversation
Fix two races in test-performance-eventlooputil resulting in a flaky test. elu1 was capture after start time t from spin look. If OS descides to reschedule the process after capturing t but before getting elu for >=50ms the spin loop is actually a nop. elu1 doesn't show this and as a result elut3 = eventLoopUtilization(elu1) results in elu3.active === 0. Moving capturing of t after capturing t, just before the spin look avoids this. Similar if OS decides to shedule a different process between getting the total elu from start and the diff elu showing the spin loop the check to verify that total active time is long then the spin loop fails. Exchanging these statements avoids this race.
Sorry, something went wrong.
|
Trying to start a stress test for this but CI stress test is not starting for some reason. Anyway, a similar stress test a couple days ago against master showed 12 failures on osx1015 in 1000 runs. So hopefully we can run that on this PR, see that it fixes things, and I can approve this and close my PR to move this test to sequential. |
Sorry, something went wrong.
|
@Trott Please note that you refer to a different test (test-worker-eventlooputil) - this PR modifies test-performance-eventlooputil |
Sorry, something went wrong.
|
Stress tests: |
Sorry, something went wrong.
trevnorris
left a comment
There was a problem hiding this comment.
I'm working on getting the test as it is on my machine to fail from a stress test, but doesn't want to.
Mind checking if moving setting elu1, but not changing anything else, fixes the issue?
Sorry, something went wrong.
As far as I remember I tried this but then the second race mentioned in description jumped in and resulted in a sporadic fail in the next assert. |
Sorry, something went wrong.
|
@Flarna Thanks. Was able to reproduce. See what you're saying. I'm testing out some variations w/ you PR. Will get back in a bit. |
Sorry, something went wrong.
trevnorris
left a comment
There was a problem hiding this comment.
@Flarna Thanks for the fix. Looks good.
Sorry, something went wrong.
Fix two races in test-performance-eventlooputil resulting in a flaky test. elu1 was capture after start time t from spin look. If OS descides to reschedule the process after capturing t but before getting elu for >=50ms the spin loop is actually a nop. elu1 doesn't show this and as a result elut3 = eventLoopUtilization(elu1) results in elu3.active === 0. Moving capturing of t after capturing t, just before the spin look avoids this. Similar if OS decides to shedule a different process between getting the total elu from start and the diff elu showing the spin loop the check to verify that total active time is long then the spin loop fails. Exchanging these statements avoids this race. PR-URL: nodejs#36028 Fixes: nodejs#35309 Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>
Fix two races in test-performance-eventlooputil resulting in a flaky test. elu1 was capture after start time t from spin look. If OS descides to reschedule the process after capturing t but before getting elu for >=50ms the spin loop is actually a nop. elu1 doesn't show this and as a result elut3 = eventLoopUtilization(elu1) results in elu3.active === 0. Moving capturing of t after capturing t, just before the spin look avoids this. Similar if OS decides to shedule a different process between getting the total elu from start and the diff elu showing the spin loop the check to verify that total active time is long then the spin loop fails. Exchanging these statements avoids this race. PR-URL: #36028 Fixes: #35309 Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>
Fix two races in test-performance-eventlooputil resulting in a flaky test. elu1 was capture after start time t from spin look. If OS descides to reschedule the process after capturing t but before getting elu for >=50ms the spin loop is actually a nop. elu1 doesn't show this and as a result elut3 = eventLoopUtilization(elu1) results in elu3.active === 0. Moving capturing of t after capturing t, just before the spin look avoids this. Similar if OS decides to shedule a different process between getting the total elu from start and the diff elu showing the spin loop the check to verify that total active time is long then the spin loop fails. Exchanging these statements avoids this race. PR-URL: #36028 Fixes: #35309 Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>
Fix two races in test-performance-eventlooputil resulting in a flaky test.
elu1 was capture after start time t from spin look. If OS descides to reschedule the process after capturing t but before getting elu for >=50ms the spin loop is actually a nop. elu1 doesn't show this and as a result elut3 = eventLoopUtilization(elu1) results in
elu3.active is lower then the expected spin time.
Moving capturing of t after capturing t, just before the spin look
avoids this.
Similar if OS decides to shedule a different process between getting the total elu from start and the diff elu showing the spin loop the check to verify that total active time is long then the spin loop
fails.
Exchanging these statements avoids this race.
Fixes: #35309
fyi @trevnorris