-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: reset failuresSuppressed
in resetFailures
#120497
roachtest: reset failuresSuppressed
in resetFailures
#120497
Conversation
In cockroachdb#119535, we introduced a `resetFailures` method that is called after a test failure when the test runner identifies that a VM was preempted while the test was running. This function makes sure that the VM preemption error is the only one visible to the issue poster, ensuring that whenever a VM is preempted, the test failure is routed to Test Eng. However, there was one situation where we still would not get the routing right: when the test times out. In this situation, we set the internal `failuresSuppressed` field to true; in other words, `resetFailures` followed by the VM preempmtion error would not have the desired effect of making the VM preemption error visible to the issue poster. While cockroachdb#119535 solves the most common source of test timeouts due to preemption (a bug in the monitor), tests can run arbitrary code and can timeout if a VM is preempted. In this commit, we reset `failuresSuppressed` in `resetFailures` to make sure that the VM preemption error is taken into account when reporting failures in arbitrary test timeouts. Fixes: cockroachdb#120381 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this solves the case where the VM is preempted but the monitor doesn't error whereas the previous PR solves the case where the monitor does error?
Am I understanding that correctly?
I think it's hard to state this in terms of whether the monitor errors or not because not every test uses a monitor (the #119535 fixed the monitor such that, if a VM was preempted, the monitor would fail (instead of hanging until the test timed out). This was by far the most common type of test timeout due to VM preemption. It didn't change any behaviour as far as "what happens when a test times out", it just made timeouts due to VM preemption a lot less likely. However, tests may timeout for other reasons: they are arbitrary code, so if they use an unbounded retry loop, for instance, they could also timeout in ways that are outside our control. What this current PR does is make sure that when the test does time out, we are still able to route the failure to test eng if there was a VM preemption. Prior to this PR, the VM preemption would be identified but we would fail to add the failure record because the timeout happened first (and therefore Does that make more sense? |
Ah yes that makes much more sense. I was trying to think of why a preempted VM wouldn't have the monitor error out, when there just isn't a monitor in the first place. That seems straightforward enough to me then. Thanks! |
TFTR! bors r=DarrylWong |
Build succeeded: |
In #119535, we introduced a
resetFailures
method that is called after a test failure when the test runner identifies that a VM was preempted while the test was running. This function makes sure that the VM preemption error is the only one visible to the issue poster, ensuring that whenever a VM is preempted, the test failure is routed to Test Eng.However, there was one situation where we still would not get the routing right: when the test times out. In this situation, we set the internal
failuresSuppressed
field to true; in other words,resetFailures
followed by the VM preempmtion error would not have the desired effect of making the VM preemption error visible to the issue poster. While #119535 solves the most common source of test timeouts due to preemption (a bug in the monitor), tests can run arbitrary code and can timeout if a VM is preempted.In this commit, we reset
failuresSuppressed
inresetFailures
to make sure that the VM preemption error is taken into account when reporting failures in arbitrary test timeouts.Fixes: #120381
Release note: None