tests/integration: Fix flaky `TestScaleoutClusterSuite` #242

anoopcs9 · 2022-08-27T12:12:36Z

TestScaleoutClusterSuite fails more frequently with the following error:

=== RUN   TestIntegration/reconciliation/scaleoutCluster/TestScaleoutClusterSuite
    reconcile_test.go:151:
        Error Trace:    reconcile_test.go:151
        Error:          Not equal:
                        expected: 3
                        actual  : 2
        Test:           TestIntegration/reconciliation/scaleoutCluster/TestScaleoutClusterSuite
        Messages:       Clustersize not as expected

Above check is to make sure that number of replicas within StatefulSet reflects the updated SmbShare.Spec.Scaling.MinClusterSize. But an immediate check on StatefulSet.Spec.Replicas might not always give us the desired(updated) value.

Therefore we retry this check within a brief 3 seconds timeout on account of any delay in field update. In addition, we at least wait for the existence of extra pods corresponding to updated replica count. Considering the increased overall test time we further raise the timeout from 20m to 30m.

anoopcs9 · 2022-08-27T13:42:46Z

Hm.. we were already at the verge of test timeout(20m) and now with this change possibility is higher as we can see with the very first run.

=== RUN   TestIntegration/reconciliation/scaleoutCluster
    reconcile_test.go:142: test ID: gtxg2j
=== RUN   TestIntegration/reconciliation/scaleoutCluster/TestScaleoutClusterSuite
=== RUN   TestIntegration/resourceUpdate
=== RUN   TestIntegration/resourceUpdate/SmbShareUpdateSuite
    resource_update_test.go:78: test ID: 7v842d
=== RUN   TestIntegration/resourceUpdate/SmbShareUpdateSuite/TestEditReadOnly
    resource_update_test.go:149: checking smbclient login to share
    resource_update_test.go:186: Setting readonly=true for SmbShare samba-operator-system/tshare1-7v842d
    resource_update_test.go:227: checking smbclient write to share
    resource_update_test.go:186: Setting readonly=false for SmbShare samba-operator-system/tshare1-7v842d
panic: test timed out after 20m0s

phlogistonjohn

The only concern I have with the change is that it seems like we're checking things in reverse order. In other words, the operator is going to update the StatefulSet based on changes to the SmbShare, then the kube built-in controllers will update the pods based on the StatefulSet. Now the test waits for the expected number of pods before checking the values of the StatefulSet, which strikes me as a bit inverted.

I guess the test already had that flaw though. Maybe what we really want is (pseudocode):

updateSmbShare();
ctx2 := contextWithTimeout()
poll(ctx, func() {
  l, err := StatefulSets.List(...)
  checkStatefulSet(l)
})
require.NoError(waitForPodExist(ctx, s), "smb server pod does not exist")

What do you think?

tests/integration/reconcile_test.go

phlogistonjohn · 2022-08-27T14:09:55Z

Hm.. we were already at the verge of test timeout(20m) and now with this change possibility is higher as we can see with the very first run.

On thursday I was saying that I noticed the non-clustered runs were taking a while when I had some networking issues the suite was timing out. I have a change to increase both by 10 mins (to 20 minutes and 30 minutes). e038c17 is clearly not yet ready for prime time, but feel free to take and adapt it for this PR if you want to.

anoopcs9 · 2022-08-29T07:55:51Z

Hm.. we were already at the verge of test timeout(20m) and now with this change possibility is higher as we can see with the very first run.

On thursday I was saying that I noticed the non-clustered runs were taking a while when I had some networking issues the suite was timing out. I have a change to increase both by 10 mins (to 20 minutes and 30 minutes). e038c17 is clearly not yet ready for prime time, but feel free to take and adapt it for this PR if you want to.

I added the change for increasing timeout for clustered test runs from 20m to 30m.

anoopcs9 · 2022-08-29T07:57:02Z

I guess the test already had that flaw though. Maybe what we really want is (pseudocode):
updateSmbShare();
ctx2 := contextWithTimeout()
poll(ctx, func() {
  l, err := StatefulSets.List(...)
  checkStatefulSet(l)
})
require.NoError(waitForPodExist(ctx, s), "smb server pod does not exist")
What do you think?

Ok, sounds reasonable.

anoopcs9 · 2022-08-29T15:15:08Z

Networking issues on infra where CentOS CI is hosted: https://lists.centos.org/pipermail/ci-users/2022-August/004605.html

Rook setup is most likely unsuccessful due to the above outage. But here is a new make check failure from golangci-lint:

internal/resources/metrics.go:14:2: could not import k8s.io/apimachinery/pkg/types (-: could not load export
data: cannot import "k8s.io/apimachinery/pkg/types" (unstable iexport format version 2, just rebuild compiler
and std library), export data is newer version - update tool) (typecheck)
	"k8s.io/apimachinery/pkg/types"
	^

phlogistonjohn · 2022-08-29T16:55:58Z

Wow, that's quite the error. Oddly, that's not what I see when I look in the CI logs. Rather, I see:

go: downloading google.golang.org/protobuf v1.27.1
go: downloading github.com/matttproud/golang_protobuf_extensions v1.0.1
golangci-lint installed in /home/runner/work/samba-operator/samba-operator/.bin
/home/runner/work/samba-operator/samba-operator/.bin/golangci-lint -c .golangci.yaml run ./...
! gofmt -e -s -l . | sed 's,^,formatting error: ,' | grep 'go$'
make: *** [Makefile:214: check-format] Error 1
formatting error: tests/integration/reconcile_test.go
Error: Process completed with exit code 2.

phlogistonjohn · 2022-08-29T16:58:01Z

I looked at the patch, and the CI is "right" there are formatting issues in your patch (spacing in the struct fields in the poll.Prober.
I'm wondering if the other error is a local issue on your setup with that linter?

anoopcs9 · 2022-08-30T07:46:03Z

I looked at the patch, and the CI is "right" there are formatting issues in your patch (spacing in the struct fields in the poll.Prober.

Running gofmt with -d reported the following diff:

--- tests/integration/reconcile_test.go.orig	2022-08-30 12:49:55.592909058 +0530
+++ tests/integration/reconcile_test.go	2022-08-30 12:49:55.592909058 +0530
@@ -202,7 +202,7 @@
 		ctx, smbShare)
 	require.NoError(err)
 
-	ctx2, cancel := context.WithTimeout(s.defaultContext(), 3 * time.Second)
+	ctx2, cancel := context.WithTimeout(s.defaultContext(), 3*time.Second)
 	defer cancel()
 	s.Require().NoError(poll.TryUntil(ctx2, &poll.Prober{
 		RetryInterval: time.Second,

-d, if present with older versions, can better pin point the formatting error.

I'm wondering if the other error is a local issue on your setup with that linter?

internal/resources/metrics.go:14:2: could not import k8s.io/apimachinery/pkg/types (-: could not load export
data: cannot import "k8s.io/apimachinery/pkg/types" (unstable iexport format version 2, just rebuild compiler
and std library), export data is newer version - update tool) (typecheck)
	"k8s.io/apimachinery/pkg/types"
	^

I could see few issues reported online and it has to do with the installed versions of Go and golanci-lint. I encountered above error on a system with following versions installed:

# go version
go version go1.18.4 linux/amd64

# .bin/golangci-lint --version
golangci-lint has version v1.43.0 built from (unknown, mod sum: "h1:SLwZFEmDgopqZpfP495zCtV9REUf551JJlJ51Ql7NZA=") on (unknown)

Our install script restrict golanci-lint at v1.43.0 and GitHub CI runs on Go 1.16. In order to get rid of this new error I did an update to latest golangci-lint.

phlogistonjohn · 2022-08-31T13:55:06Z

GitHub CI runs on Go 1.16

Oh, we should definitely update that at some point soonish.

phlogistonjohn

Looks OK to me, thanks!

anoopcs9 · 2022-09-01T11:10:10Z

/test centos-ci/sink-clustered/mini-k8s-1.24

phlogistonjohn

Please make the test.sh change a separate commit and then we should be all good.

TestScaleoutClusterSuite fails more frequently with the following error: === RUN TestIntegration/reconciliation/scaleoutCluster/TestScaleoutClusterSuite reconcile_test.go:151: Error Trace: reconcile_test.go:151 Error: Not equal: expected: 3 actual : 2 Test: TestIntegration/reconciliation/scaleoutCluster/TestScaleoutClusterSuite Messages: Clustersize not as expected Above check is to make sure that number of replicas within StatefulSet reflects the updated SmbShare.Spec.Scaling.MinClusterSize. But an immediate check on StatefulSet.Spec.Replicas might not always give us the desired(updated) value. Therefore we retry this check within a brief 3 seconds timeout on account of any delay in field update. In addition, we at least wait for the existence of extra pods corresponding to updated replica count. Signed-off-by: Anoop C S <[email protected]>

phlogistonjohn

LGTM.
I was fine with it being a separate commit rather than a full blown seperate PR, but either way this now looks fine AFAIAC.

anoopcs9 · 2022-09-22T05:48:54Z

LGTM. I was fine with it being a separate commit rather than a full blown seperate PR, but either way this now looks fine AFAIAC.

Ah..my bad !

phlogistonjohn reviewed Aug 27, 2022

View reviewed changes

tests/integration/reconcile_test.go Outdated Show resolved Hide resolved

anoopcs9 force-pushed the fix-scaleout-cluster-test branch 2 times, most recently from f9f736b to 42dc35a Compare August 29, 2022 09:01

anoopcs9 force-pushed the fix-scaleout-cluster-test branch from 42dc35a to 0847fb2 Compare August 30, 2022 07:22

anoopcs9 requested a review from phlogistonjohn August 31, 2022 12:27

phlogistonjohn previously approved these changes Aug 31, 2022

View reviewed changes

anoopcs9 mentioned this pull request Sep 13, 2022

build: fix minor 'make check' error with go1.19 #245

Closed

anoopcs9 dismissed phlogistonjohn’s stale review via c95a800 September 13, 2022 13:36

anoopcs9 force-pushed the fix-scaleout-cluster-test branch 3 times, most recently from 5ac552a to db9da7a Compare September 17, 2022 10:21

phlogistonjohn requested changes Sep 21, 2022

View reviewed changes

anoopcs9 force-pushed the fix-scaleout-cluster-test branch from db9da7a to b64253f Compare September 21, 2022 15:47

anoopcs9 requested a review from phlogistonjohn September 21, 2022 16:28

phlogistonjohn approved these changes Sep 21, 2022

View reviewed changes

Merge branch 'master' into fix-scaleout-cluster-test

7af6d63

phlogistonjohn merged commit fd5ef50 into samba-in-kubernetes:master Sep 21, 2022

anoopcs9 deleted the fix-scaleout-cluster-test branch September 22, 2022 05:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests/integration: Fix flaky `TestScaleoutClusterSuite` #242

tests/integration: Fix flaky `TestScaleoutClusterSuite` #242

anoopcs9 commented Aug 27, 2022 •

edited

Loading

anoopcs9 commented Aug 27, 2022

phlogistonjohn left a comment

phlogistonjohn commented Aug 27, 2022 •

edited

Loading

anoopcs9 commented Aug 29, 2022

anoopcs9 commented Aug 29, 2022

anoopcs9 commented Aug 29, 2022

phlogistonjohn commented Aug 29, 2022

phlogistonjohn commented Aug 29, 2022

anoopcs9 commented Aug 30, 2022

phlogistonjohn commented Aug 31, 2022

phlogistonjohn left a comment

anoopcs9 commented Sep 1, 2022 •

edited

Loading

phlogistonjohn left a comment

phlogistonjohn left a comment

anoopcs9 commented Sep 22, 2022

tests/integration: Fix flaky TestScaleoutClusterSuite #242

tests/integration: Fix flaky TestScaleoutClusterSuite #242

Conversation

anoopcs9 commented Aug 27, 2022 • edited Loading

anoopcs9 commented Aug 27, 2022

phlogistonjohn left a comment

Choose a reason for hiding this comment

phlogistonjohn commented Aug 27, 2022 • edited Loading

anoopcs9 commented Aug 29, 2022

anoopcs9 commented Aug 29, 2022

anoopcs9 commented Aug 29, 2022

phlogistonjohn commented Aug 29, 2022

phlogistonjohn commented Aug 29, 2022

anoopcs9 commented Aug 30, 2022

phlogistonjohn commented Aug 31, 2022

phlogistonjohn left a comment

Choose a reason for hiding this comment

anoopcs9 commented Sep 1, 2022 • edited Loading

phlogistonjohn left a comment

Choose a reason for hiding this comment

phlogistonjohn left a comment

Choose a reason for hiding this comment

anoopcs9 commented Sep 22, 2022

tests/integration: Fix flaky `TestScaleoutClusterSuite` #242

tests/integration: Fix flaky `TestScaleoutClusterSuite` #242

anoopcs9 commented Aug 27, 2022 •

edited

Loading

phlogistonjohn commented Aug 27, 2022 •

edited

Loading

anoopcs9 commented Sep 1, 2022 •

edited

Loading