testing: "panic: Log in goroutine after Test..." is unreliable due to lack of synchronization on t.done #67701

Groxx · 2024-05-29T19:27:49Z

Go version

go1.22.3 darwin/arm64

Output of `go env` in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/stevenl/Library/Caches/go-build'
GOENV='/Users/stevenl/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/stevenl/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/stevenl/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/Users/stevenl/go/pkg/mod/golang.org/[email protected]'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/Users/stevenl/go/pkg/mod/golang.org/[email protected]/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.3'
GCCGO='gccgo'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/Users/stevenl/go/src/github.com/uber/cadence/go.mod'
GOWORK='/Users/stevenl/go/src/github.com/uber/cadence/go.work'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/zp/sqm92pj94sq7whfh1rl3yjd00000gn/T/go-build780465693=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

/*
❯ cat testlogger_test.go
*/
package testlogger

import (
	"fmt"
	"testing"
	"time"
)

func TestMain(m *testing.M) {
	m.Run()
	time.Sleep(100 * time.Millisecond) // also sometimes works without testmain, but more consistent with next test
}

func TestABefore(t *testing.T) {
	go func() {
		t.Logf("sleeping")
		time.Sleep(10 * time.Millisecond)
		t.Logf("too late")
		fmt.Println("goroutine done") // prove it ran
	}()
	time.Sleep(time.Millisecond)
}

func TestZLater(t *testing.T) {
	t.Logf("another test")
	time.Sleep(time.Second) // make sure the process hangs around long enough
}

/*
❯ go test -count 1 -v -race ./...
=== RUN   TestABefore
    testlogger_test.go:15: sleeping
--- PASS: TestABefore (0.00s)
=== RUN   TestZLater
    testlogger_test.go:24: another test
goroutine done
--- PASS: TestZLater (1.00s)
PASS
ok  	example.com/testlogger_test.go	1.465s
*/

Despite clearly logging after the test has completed, this test does not reliably fail, depending on other code in the test and elsewhere. And when it does not fail, t.Logf("too late") is silently not logged anywhere.

This is hard to reliably reproduce, but the cause is fairly straightforward: whether it logs or panics is determined by checking t.done inside the test's mutex, but t.done is changed outside the mutex, so that value change may not propagate to the goroutine that produces the faulty log.

Forcing a happens-after relationship reliably causes failure:

/*
❯ cat testlogger_test.go
*/
package testlogger

import (
	"fmt"
	"testing"
	"time"
)

var done = make(chan struct{})

func TestMain(m *testing.M) {
	m.Run()
	close(done)  // happens-after t.done = true
	time.Sleep(100 * time.Millisecond)
}
func TestABefore(t *testing.T) {
	go func() {
		t.Logf("sleeping")
		time.Sleep(10 * time.Millisecond)
		<-done
		t.Logf("too late") // now also happens-after t.done = true
		fmt.Println("goroutine done")
	}()
	time.Sleep(time.Millisecond)
}

func TestZLater(t *testing.T) {
	t.Logf("another test")
	time.Sleep(time.Second)
}

/*
❯ go test -count 1 -v -race ./...
=== RUN   TestABefore
    testlogger_test.go:17: sleeping
--- PASS: TestABefore (0.00s)
PASS
panic: Log in goroutine after TestABefore has completed: too late

goroutine 35 [running]:
testing.(*common).logDepth(0x14000410820, {0x14000133f70, 0x8}, 0x3)
	/Users/stevenl/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1028 +0x3d0
testing.(*common).log(...)
	/Users/stevenl/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1010
testing.(*common).Logf(0x14000410820, {0x1006c4df4?, 0x0?}, {0x0?, 0x0?, 0x0?})
	/Users/stevenl/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1061 +0x5c
example.com/testlogger.TestABefore.func1()
	/Users/stevenl/go/src/example.com/testlogger/testlogger_test.go:19 +0x6c
created by example.com/testlogger.TestABefore in goroutine 34
	/Users/stevenl/go/src/example.com/testlogger/testlogger_test.go:16 +0x5c
FAIL	example.com/testlogger	0.331s
FAIL
*/

As much as I am fond of this race-detecting trick, and appreciate the issues it catches:
https://github.com/golang/go/blob/master/src/testing/testing.go#L1675-L1677

		// Do not lock t.done to allow race detector to detect race in case
		// the user does not appropriately synchronize a goroutine.
		t.done = true

I feel like this log-failure-flakiness is an unintended consequence. Maybe the race-detecting access could be moved to a different field, rather than one that has a rather significant impact on behavior?

It's also strange to me that this doesn't seem to be (reliably) catching this race - is that happening inside t.checkRaces(), so this all races later than the check? I'm not familiar with this part of race-detection: https://github.com/golang/go/blob/master/src/testing/testing.go#L1548C1-L1549C1

What did you see happen?

❯ go test -count 1 -v  ./...
...
PASS
ok  	example.com/testlogger_test.go	1.465s

Tests pass and the too-late log is not caught nor is its output sent to stdout/stderr.

What did you expect to see?

panic: Log in goroutine after TestABefore has completed: too late

should be produced reliably, as long as the test clearly finishes before the log occurs (as seen from a real-world observer, i.e. me). And it should not require outside-of-test synchronization to behave reliably.

Proof-of-concept fix

#67796

The text was updated successfully, but these errors were encountered:

mknyszek · 2024-06-03T15:03:46Z

CC @adonovan @neild via https://dev.golang.org/owners

Groxx · 2024-06-03T18:57:14Z

Added a proof-of-concept fix over in #67796

gopherbot · 2024-06-03T19:03:03Z

Change https://go.dev/cl/590055 mentions this issue: testing: fix racy t.done usage by adding a separate race-canary field

Groxx · 2025-01-15T22:34:18Z

sort of related, I'm curious why this minor modification doesn't trigger the race detector:

var done = make(chan struct{})

func TestMain(m *testing.M) {
	m.Run()
	fmt.Println(done)
}

func TestABefore(t *testing.T) {
	go func() {
		time.Sleep(10 * time.Millisecond)
		done = nil
		fmt.Println("goroutine done")
	}()
	time.Sleep(time.Millisecond)
}

func TestZLater(t *testing.T) {
	time.Sleep(time.Second)
}

but removing the TestZLater or removing its sleep does (and e.g. moving it to after the TestMain read, so the process doesn't end too soon).

It seems like this should be a fairly provable race, since nothing at all synchronizes between the assignment and the TestMain read.

ianlancetaylor · 2025-01-16T00:12:39Z

@Groxx That is most likely because running a test actually involves a couple of goroutines that synchronize with each other. Doing that synchronization around running TestZLater may appear to be a synchronization after the TestABefore goroutine completes.

Groxx mentioned this issue May 29, 2024

Suppress test failures due to logs after tests complete cadence-workflow/cadence#6067

Merged

Groxx changed the title ~~testing: "Log in goroutine after Test..." is unreliable due to lack of synchronization on t.done~~ testing: "panic: Log in goroutine after Test..." is unreliable due to lack of synchronization on t.done May 30, 2024

mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 3, 2024

mknyszek added this to the Backlog milestone Jun 3, 2024

Groxx linked a pull request Jun 3, 2024 that will close this issue

testing: fix racy t.done usage by adding a separate race-canary field #67796

Open

gabyhelp mentioned this issue Jul 2, 2024

cmd/trace, x/exp/trace: nondeterministic output, seems related to timestamp tie-breaking #68277

Closed

gabyhelp mentioned this issue Aug 15, 2024

runtime: confusing panic on parallel calls to yield function #68897

Open

This was referenced Dec 9, 2024

testing/synctest: panic with time.NewTimer #70741

Closed

testing/synctest: blocks forever when use time.Tick #70857

Closed

gabyhelp added the Bug label Jan 4, 2025

jba added BugReport Issues describing a possible bug in the Go implementation. and removed Bug labels Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing: "panic: Log in goroutine after Test..." is unreliable due to lack of synchronization on t.done #67701

testing: "panic: Log in goroutine after Test..." is unreliable due to lack of synchronization on t.done #67701

Groxx commented May 29, 2024 •

edited

Loading

mknyszek commented Jun 3, 2024

Groxx commented Jun 3, 2024

gopherbot commented Jun 3, 2024

Groxx commented Jan 15, 2025 •

edited

Loading

ianlancetaylor commented Jan 16, 2025

testing: "panic: Log in goroutine after Test..." is unreliable due to lack of synchronization on t.done #67701

testing: "panic: Log in goroutine after Test..." is unreliable due to lack of synchronization on t.done #67701

Comments

Groxx commented May 29, 2024 • edited Loading

Go version

Output of go env in your module/workspace:

What did you do?

What did you see happen?

What did you expect to see?

Proof-of-concept fix

mknyszek commented Jun 3, 2024

Groxx commented Jun 3, 2024

gopherbot commented Jun 3, 2024

Groxx commented Jan 15, 2025 • edited Loading

ianlancetaylor commented Jan 16, 2025

Groxx commented May 29, 2024 •

edited

Loading

Output of `go env` in your module/workspace:

Groxx commented Jan 15, 2025 •

edited

Loading