staticcheck: rare false positive in U1000 #642

cespare · 2019-11-04T21:53:20Z

I'm in the process of tracking down what I believe to be a rare, non-deterministic false positive in staticcheck. Unfortunately, this is an issue we've only seen in our CI system running on private code; I've been unable to reliably reproduce it so far.

This is the current version we're using:

$ staticcheck -debug.version
staticcheck 2019.2.3

Compiled with Go version: go1.13
Main module:
        honnef.co/go/[email protected] (sum: h1:3JgtbtFHMiCmsznwGVTUWbgGov+pVqnlf1dEJTNAXeM=)
Dependencies:
        github.com/BurntSushi/[email protected] (sum: h1:WXkYYl6Yr3qBf1K79EBnL4mak0OimBfB0XUf9Vl28OQ=)
        golang.org/x/[email protected] (sum: h1:MQEvx39qSf8vyrx3XRaOe+j1UDIzKwkYOVObRgGPVqI=)

We've seen this false positive occur with Go 1.13.4 and previously with Go 1.12.4.

We also previously saw the false positives with staticcheck 2019.2 and 2019.2.2, but not 2019.1.1 AFAICT.

We're running staticcheck as

staticcheck root/...

(That is, a package selection expression that includes all our Go packages.)

Our CI environment is running on linux/amd64.

Some things I noticed when I browsed the set of false positives:

We've seen 16 false positives in the ~4 months since we upgraded from staticcheck 2019.1.1 to 2019.2. During that time we've had tens or hundreds of thousands of runs in CI (don't have exact numbers available).

All these false positives share a common feature: they flag an unexported name which is defined in a non-test file and is referred to exactly once, in a test file. For example, 6/16 of the false positives were for a particular method similar to this:

// server.go
type server struct {...}
func (s *server) close() { ... }

// server_test.go
func TestXYZ(t *testing.T) {
	s := newTestServer(...)
	defer s.close() // only time s.close is used anywhere

	// ...
}

We have examples with methods, plain functions, and also vars.

I'm working to try to reproduce this, but perhaps this is enough of a clue to give you a lead.

The text was updated successfully, but these errors were encountered:

cespare · 2019-11-05T01:52:18Z

I have 384 CPUs working on this.

viswajithiii · 2019-11-05T23:19:33Z

Can confirm that I've seen this too. FWIW, I thought this was intended behavior and not a false positive -- it seems like an unexported function used only in a test should be in an _test.go file. But the non-determinism was definitely not ideal.

cespare · 2019-11-05T23:24:37Z

@viswajithiii that's an interesting point. It's true that the non-test package, considered in isolation, does have a genuinely unused function/var.

@dominikh can you confirm the intended behavior?

dominikh · 2019-11-05T23:40:50Z

An identifier declared in the non-test variant of a package can be marked as used by the test variant of the package. This is intended behavior (and part of why U1000 is special in staticcheck, because it has to see all packages before it can make a decision.)

There is definitely a camp that thinks these objects should be defined in the test variant if only tests use them, but not even the standard library abides by that. And xtests make the situation even more confusing. So we opted for the least noisy, least opinionated approach, which allows the greatest amount of objects to be marked as used.

The non-determinism, however, is probably caused by a bug in the aforementioned special logic. Maybe we're racy, not waiting for the completion of all packages? Or maybe we're order-dependent for some reason.

cespare · 2019-11-05T23:57:48Z

Status update:

I've been able to reproduce within a few minutes by running a whole bunch of staticchecks in parallel. I was able to narrow it down a bit to a failing invocation like staticcheck -checks U1000 one/particular/package. If I run that with GOMAXPROCS=8 or so, it seems to fail after a few hundred runs on average.

For all these tests, each run of staticcheck has its own fresh XDG_CACHE_HOME, GOCACHE, and TMPDIR, so AFAIK there's no shared on-disk state in play.

In addition to 2019.2.3, I've reproduced this at master (f51cd49).

I've also reproduced this with staticcheck built with -race. There were no race warnings printed out, and it seemed to take longer to repro (not just wall time, but iterations -- I had to run about 3000 before I got a failure).

I've tried reproducing this with a minimal synthetic package but I couldn't get any failures after about 100k runs.

Run with: (cd ./cmd/staticcheck && go build) rm -rf ~/.cache/staticcheck ./cmd/staticcheck/staticcheck -checks U1000 ./pattern This won't always trigger all 3 false positives, but it usually triggers at least some.

cespare · 2019-11-07T01:23:22Z

Took me way too long to figure out, but I understand this bug now.

First, here's a fairly reliable repro:

cespare@356eff0

(After checking that out, you can follow the directions in the commit message.)

The problem is that the objNodes sync.Map in unused is accessed in a racy way here:

go-tools/unused/unused.go

Lines 870 to 885 in f51cd49

    
           if node, ok := g.Nodes.Load(obj); ok { 
        
           	return node.(*Node), false 
        
           } 
        
           if obj, ok := obj.(types.Object); ok { 
        
           	key := objNodeKeyFor(g.fset, obj) 
        
           	if o, ok := g.objNodes.Load(key); ok { 
        
           		onode := o.(*Node) 
        
           		return onode, false 
        
           	} 
        
           	node = g.newNode(ctx, obj) 
        
           	g.Nodes.Store(obj, node) 
        
           	g.objNodes.Store(key, node) 
        
           	return node, true 
        
           }

The problem happens something like this (assuming we have mypkg with a function xxx that is used only in the test):

Though we're running on a single package, packages.Load gives us 3 packages: the package under test, the testing variant of the package (including test files), and the synthetic testing mypkg.test package. (That last one doesn't matter for our purposes.)
The two variants of the package under test are inserted into the unused graph in parallel.
When we find the (*types.Func) for xxx (in both packages), it could so happen that we enter (*Graph).node for both packages at the same time.
The goroutines race on the load and subsequent store of g.objNodes. That is, they construct the same key, both goroutines load nothing, and then both goroutines store a new node. Both nodes end up in g.Nodes.
Now depending on which goroutine lost the store race, one of the duplicate funcs in the graph will appear unused.

In fact, in my testing I found that it's fairly common to have duplicate values in the graph that should have been deduplicated. Most of the time this doesn't trigger a false positive since the value will be used in both packages that raced to add it to the graph. It requires the somewhat unusual condition of having an unexported function in one package that is only used in the testing package.

The 'node' function non-atomically loaded an existing node or created a new one. This could lead to two goroutines each creating new nodes. This would ultimately lead to false positives when checking a package and its test variant, because objects wouldn't be deduplicated correctly. Instead of using LoadOrStore, we switch away from sync.Map completely and use coarse locking on normal maps. Using LoadOrStore would require us to allocate nodes just to throw them away afterwards, at a high rate. At the same time, using normal maps + a mutex does not have a measurable performance impact – as determined by benchmarking against the standard library. We also remove the context-local typeNodes map. It served as a lock-free fast path, but benchmarking shows that it has no measurable performance impact, either. Thanks to Caleb Spare for uncovering this fairly embarrassing blunder. Closes dominikhgh-642

cespare added false-positive needs-triage Newly filed issue that needs triage labels Nov 4, 2019

dominikh added waiting-for-feedback Waiting for the user to get back to us and removed needs-triage Newly filed issue that needs triage labels Nov 5, 2019

dominikh closed this as completed in a9480a3 Nov 7, 2019

dominikh removed the waiting-for-feedback Waiting for the user to get back to us label Nov 7, 2019

dominikh mentioned this issue Jan 18, 2020

[Flaky test] staticcheck reports used functions as unused kubernetes/kubernetes#86340

Closed

cespare mentioned this issue Apr 13, 2020

staticcheck: rare (flaky) U1000 false positive #726

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

staticcheck: rare false positive in U1000 #642

staticcheck: rare false positive in U1000 #642

cespare commented Nov 4, 2019

cespare commented Nov 5, 2019

viswajithiii commented Nov 5, 2019 •

edited

Loading

cespare commented Nov 5, 2019

dominikh commented Nov 5, 2019

cespare commented Nov 5, 2019

cespare commented Nov 7, 2019

staticcheck: rare false positive in U1000 #642

staticcheck: rare false positive in U1000 #642

Comments

cespare commented Nov 4, 2019

cespare commented Nov 5, 2019

viswajithiii commented Nov 5, 2019 • edited Loading

cespare commented Nov 5, 2019

dominikh commented Nov 5, 2019

cespare commented Nov 5, 2019

cespare commented Nov 7, 2019

viswajithiii commented Nov 5, 2019 •

edited

Loading