Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola: test panics #2701

Closed
saqibali-2k opened this issue Feb 9, 2022 · 1 comment · Fixed by #2710
Closed

kola: test panics #2701

saqibali-2k opened this issue Feb 9, 2022 · 1 comment · Fixed by #2710

Comments

@saqibali-2k
Copy link
Member

Recently we have seen kola panic when a timeout occurs:

[2022-02-09T15:25:22.847Z] === RUN   non-exclusive-test-bucket-0/ext.config.podman.dns
[2022-02-09T15:32:59.324Z] panic: test executed panic(nil) or runtime.Goexit
[2022-02-09T15:32:59.324Z] 
[2022-02-09T15:32:59.324Z] goroutine 17203 [running]:
[2022-02-09T15:32:59.324Z] github.com/coreos/mantle/harness.tRunner.func1(0xc0004222c0)
[2022-02-09T15:32:59.324Z] 	/root/containerbuild/mantle/harness/harness.go:470 +0x2fe
[2022-02-09T15:32:59.324Z] runtime.Goexit()
[2022-02-09T15:32:59.324Z] 	/usr/lib/golang/src/runtime/panic.go:613 +0x1e5
[2022-02-09T15:32:59.324Z] github.com/coreos/mantle/harness.(*H).FailNow(0xc00088e2c0)
[2022-02-09T15:32:59.324Z] 	/root/containerbuild/mantle/harness/harness.go:293 +0x3c
[2022-02-09T15:32:59.324Z] github.com/coreos/mantle/harness.(*H).Fatalf(0xc00088e2c0, 0x22cc77b, 0x10, 0xc0006d59e8, 0x2, 0x2)
[2022-02-09T15:32:59.324Z] 	/root/containerbuild/mantle/harness/harness.go:336 +0x87
[2022-02-09T15:32:59.324Z] github.com/coreos/mantle/harness.(*H).runTimeoutCheck(0xc00088e2c0, 0x2642308, 0xc000ac06c0, 0x8bb2c97000, 0xc000644000, 0xc001450030, 0x26)
[2022-02-09T15:32:59.324Z] 	/root/containerbuild/mantle/harness/harness.go:101 +0x1dd
[2022-02-09T15:32:59.324Z] github.com/coreos/mantle/harness.(*H).RunWithExecTimeoutCheck(0xc00088e2c0, 0xc000644000, 0xc001450030, 0x26)
[2022-02-09T15:32:59.324Z] 	/root/containerbuild/mantle/harness/harness.go:128 +0x7b
[2022-02-09T15:32:59.324Z] github.com/coreos/mantle/kola/cluster.(*TestCluster).SSH(0xc0006d5df8, 0x265d2e8, 0xc000ac4600, 0xc001450000, 0x21, 0x7f1a3415ca40, 0x48bb05, 0x1, 0x7f1a5d23c5b8, 0x30)
[2022-02-09T15:32:59.324Z] 	/root/containerbuild/mantle/kola/cluster/cluster.go:159 +0x1a5
[2022-02-09T15:32:59.324Z] github.com/coreos/mantle/kola/cluster.(*TestCluster).MustSSH(0xc0006d5df8, 0x265d2e8, 0xc000ac4600, 0xc001450000, 0x21, 0xc001450000, 0x21, 0xc000aa4060)
[2022-02-09T15:32:59.324Z] 	/root/containerbuild/mantle/kola/cluster/cluster.go:176 +0x74
[2022-02-09T15:32:59.324Z] github.com/coreos/mantle/kola/cluster.(*TestCluster).MustSSHf(0xc0006d5df8, 0x265d2e8, 0xc000ac4600, 0x22cd0db, 0x10, 0xc0006d5d90, 0x1, 0x1, 0x1c6a040, 0x3452570, ...)
[2022-02-09T15:32:59.324Z] 	/root/containerbuild/mantle/kola/cluster/cluster.go:190 +0x93
[2022-02-09T15:32:59.324Z] github.com/coreos/mantle/kola.collectLogsExternalTest(0xc0004222c0, 0xc000a2c1e0, 0xc00088e2c0, 0x26590c8, 0xc00000e4b0, 0x0, 0x0, 0x0, 0x0)
[2022-02-09T15:32:59.324Z] 	/root/containerbuild/mantle/kola/harness.go:1162 +0x431
[2022-02-09T15:32:59.324Z] github.com/coreos/mantle/kola.makeNonExclusiveTest.func1.1(0xc0004222c0)
[2022-02-09T15:32:59.324Z] 	/root/containerbuild/mantle/kola/harness.go:1291 +0x15d
[2022-02-09T15:32:59.324Z] github.com/coreos/mantle/harness.tRunner(0xc0004222c0, 0xc000130a00)
[2022-02-09T15:32:59.324Z] 	/root/containerbuild/mantle/harness/harness.go:512 +0x104
[2022-02-09T15:32:59.324Z] created by github.com/coreos/mantle/harness.(*H).RunTimeout
[2022-02-09T15:32:59.324Z] 	/root/containerbuild/mantle/harness/harness.go:557 +0x38e
[2022-02-09T15:32:59.324Z] qemu-system-x86_64: terminating on signal 15 from pid 4886 ()
[2022-02-09T15:32:59.324Z] qemu-system-x86_64: tpm-emulator: Could not cleanly shutdown the TPM: Input/output error

More information will be added to this issue as I investigate further.

@saqibali-2k
Copy link
Member Author

Steps to reproduce:
Drop the timeout to 1 min in harness/harness.go and add line below in kola/harness.go

@@ -1159,6 +1159,7 @@ func collectLogsExternalTest(h *harness.H, t *register.Test, tcluster cluster.Te
                        return
                }
                defer f.Close()
+               time.Sleep(time.Minute * 1)
                out := tcluster.MustSSHf(mach, "journalctl -t %s", unit)
                if _, err = f.WriteString(string(out)); err != nil {

saqibali-2k added a commit to saqibali-2k/coreos-assembler that referenced this issue Feb 15, 2022
Recently we have seen non-exclusive tests panicking due to
runtime.Goexit executing, but somehow harness.finished being false.
This problem was occurring because we pass tcluster to collectLogsExternalTest
instead of newTC. tcluster has a reference to the harness of the wrapper test,
thus harness.finished is incorrectly set to true for the wrapper test's harness
instead of the non-exclusive subtest's harness.

closes: coreos#2701
dustymabe pushed a commit that referenced this issue Feb 15, 2022
Recently we have seen non-exclusive tests panicking due to
runtime.Goexit executing, but somehow harness.finished being false.
This problem was occurring because we pass tcluster to collectLogsExternalTest
instead of newTC. tcluster has a reference to the harness of the wrapper test,
thus harness.finished is incorrectly set to true for the wrapper test's harness
instead of the non-exclusive subtest's harness.

closes: #2701
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant