-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests/integration: Fix flaky TestScaleoutClusterSuite
#242
tests/integration: Fix flaky TestScaleoutClusterSuite
#242
Conversation
Hm.. we were already at the verge of test timeout(20m) and now with this change possibility is higher as we can see with the very first run.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only concern I have with the change is that it seems like we're checking things in reverse order. In other words, the operator is going to update the StatefulSet based on changes to the SmbShare, then the kube built-in controllers will update the pods based on the StatefulSet. Now the test waits for the expected number of pods before checking the values of the StatefulSet, which strikes me as a bit inverted.
I guess the test already had that flaw though. Maybe what we really want is (pseudocode):
updateSmbShare();
ctx2 := contextWithTimeout()
poll(ctx, func() {
l, err := StatefulSets.List(...)
checkStatefulSet(l)
})
require.NoError(waitForPodExist(ctx, s), "smb server pod does not exist")
What do you think?
On thursday I was saying that I noticed the non-clustered runs were taking a while when I had some networking issues the suite was timing out. I have a change to increase both by 10 mins (to 20 minutes and 30 minutes). e038c17 is clearly not yet ready for prime time, but feel free to take and adapt it for this PR if you want to. |
I added the change for increasing timeout for clustered test runs from 20m to 30m. |
Ok, sounds reasonable. |
f9f736b
to
42dc35a
Compare
Networking issues on infra where CentOS CI is hosted: https://lists.centos.org/pipermail/ci-users/2022-August/004605.html Rook setup is most likely unsuccessful due to the above outage. But here is a new
|
Wow, that's quite the error. Oddly, that's not what I see when I look in the CI logs. Rather, I see:
|
I looked at the patch, and the CI is "right" there are formatting issues in your patch (spacing in the struct fields in the poll.Prober. |
42dc35a
to
0847fb2
Compare
Running --- tests/integration/reconcile_test.go.orig 2022-08-30 12:49:55.592909058 +0530
+++ tests/integration/reconcile_test.go 2022-08-30 12:49:55.592909058 +0530
@@ -202,7 +202,7 @@
ctx, smbShare)
require.NoError(err)
- ctx2, cancel := context.WithTimeout(s.defaultContext(), 3 * time.Second)
+ ctx2, cancel := context.WithTimeout(s.defaultContext(), 3*time.Second)
defer cancel()
s.Require().NoError(poll.TryUntil(ctx2, &poll.Prober{
RetryInterval: time.Second,
I could see few issues reported online and it has to do with the installed versions of Go and
Our install script restrict |
Oh, we should definitely update that at some point soonish. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me, thanks!
/test centos-ci/sink-clustered/mini-k8s-1.24 |
5ac552a
to
db9da7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the test.sh change a separate commit and then we should be all good.
TestScaleoutClusterSuite fails more frequently with the following error: === RUN TestIntegration/reconciliation/scaleoutCluster/TestScaleoutClusterSuite reconcile_test.go:151: Error Trace: reconcile_test.go:151 Error: Not equal: expected: 3 actual : 2 Test: TestIntegration/reconciliation/scaleoutCluster/TestScaleoutClusterSuite Messages: Clustersize not as expected Above check is to make sure that number of replicas within StatefulSet reflects the updated SmbShare.Spec.Scaling.MinClusterSize. But an immediate check on StatefulSet.Spec.Replicas might not always give us the desired(updated) value. Therefore we retry this check within a brief 3 seconds timeout on account of any delay in field update. In addition, we at least wait for the existence of extra pods corresponding to updated replica count. Signed-off-by: Anoop C S <[email protected]>
db9da7a
to
b64253f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I was fine with it being a separate commit rather than a full blown seperate PR, but either way this now looks fine AFAIAC.
Ah..my bad ! |
TestScaleoutClusterSuite
fails more frequently with the following error:Above check is to make sure that number of replicas within StatefulSet reflects the updated SmbShare.Spec.Scaling.MinClusterSize. But an immediate check on StatefulSet.Spec.Replicas might not always give us the desired(updated) value.
Therefore we retry this check within a brief 3 seconds timeout on account of any delay in field update. In addition, we at least wait for the existence of extra pods corresponding to updated replica count. Considering the increased overall test time we further raise the timeout from 20m to 30m.