Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

staticcheck: rare false positive in U1000 #642

Closed
cespare opened this issue Nov 4, 2019 · 6 comments
Closed

staticcheck: rare false positive in U1000 #642

cespare opened this issue Nov 4, 2019 · 6 comments

Comments

@cespare
Copy link

cespare commented Nov 4, 2019

I'm in the process of tracking down what I believe to be a rare, non-deterministic false positive in staticcheck. Unfortunately, this is an issue we've only seen in our CI system running on private code; I've been unable to reliably reproduce it so far.

This is the current version we're using:

$ staticcheck -debug.version
staticcheck 2019.2.3

Compiled with Go version: go1.13
Main module:
        honnef.co/go/[email protected] (sum: h1:3JgtbtFHMiCmsznwGVTUWbgGov+pVqnlf1dEJTNAXeM=)
Dependencies:
        github.com/BurntSushi/[email protected] (sum: h1:WXkYYl6Yr3qBf1K79EBnL4mak0OimBfB0XUf9Vl28OQ=)
        golang.org/x/[email protected] (sum: h1:MQEvx39qSf8vyrx3XRaOe+j1UDIzKwkYOVObRgGPVqI=)

We've seen this false positive occur with Go 1.13.4 and previously with Go 1.12.4.

We also previously saw the false positives with staticcheck 2019.2 and 2019.2.2, but not 2019.1.1 AFAICT.

We're running staticcheck as

staticcheck root/...

(That is, a package selection expression that includes all our Go packages.)

Our CI environment is running on linux/amd64.


Some things I noticed when I browsed the set of false positives:

We've seen 16 false positives in the ~4 months since we upgraded from staticcheck 2019.1.1 to 2019.2. During that time we've had tens or hundreds of thousands of runs in CI (don't have exact numbers available).

All these false positives share a common feature: they flag an unexported name which is defined in a non-test file and is referred to exactly once, in a test file. For example, 6/16 of the false positives were for a particular method similar to this:

// server.go
type server struct {...}
func (s *server) close() { ... }

// server_test.go
func TestXYZ(t *testing.T) {
	s := newTestServer(...)
	defer s.close() // only time s.close is used anywhere

	// ...
}

We have examples with methods, plain functions, and also vars.

I'm working to try to reproduce this, but perhaps this is enough of a clue to give you a lead.

@cespare cespare added false-positive needs-triage Newly filed issue that needs triage labels Nov 4, 2019
@cespare
Copy link
Author

cespare commented Nov 5, 2019

I have 384 CPUs working on this.

@dominikh dominikh added waiting-for-feedback Waiting for the user to get back to us and removed needs-triage Newly filed issue that needs triage labels Nov 5, 2019
@viswajithiii
Copy link

viswajithiii commented Nov 5, 2019

Can confirm that I've seen this too. FWIW, I thought this was intended behavior and not a false positive -- it seems like an unexported function used only in a test should be in an _test.go file. But the non-determinism was definitely not ideal.

@cespare
Copy link
Author

cespare commented Nov 5, 2019

@viswajithiii that's an interesting point. It's true that the non-test package, considered in isolation, does have a genuinely unused function/var.

@dominikh can you confirm the intended behavior?

@dominikh
Copy link
Owner

dominikh commented Nov 5, 2019

An identifier declared in the non-test variant of a package can be marked as used by the test variant of the package. This is intended behavior (and part of why U1000 is special in staticcheck, because it has to see all packages before it can make a decision.)

There is definitely a camp that thinks these objects should be defined in the test variant if only tests use them, but not even the standard library abides by that. And xtests make the situation even more confusing. So we opted for the least noisy, least opinionated approach, which allows the greatest amount of objects to be marked as used.

The non-determinism, however, is probably caused by a bug in the aforementioned special logic. Maybe we're racy, not waiting for the completion of all packages? Or maybe we're order-dependent for some reason.

@cespare
Copy link
Author

cespare commented Nov 5, 2019

Status update:

I've been able to reproduce within a few minutes by running a whole bunch of staticchecks in parallel. I was able to narrow it down a bit to a failing invocation like staticcheck -checks U1000 one/particular/package. If I run that with GOMAXPROCS=8 or so, it seems to fail after a few hundred runs on average.

For all these tests, each run of staticcheck has its own fresh XDG_CACHE_HOME, GOCACHE, and TMPDIR, so AFAIK there's no shared on-disk state in play.

In addition to 2019.2.3, I've reproduced this at master (f51cd49).

I've also reproduced this with staticcheck built with -race. There were no race warnings printed out, and it seemed to take longer to repro (not just wall time, but iterations -- I had to run about 3000 before I got a failure).

I've tried reproducing this with a minimal synthetic package but I couldn't get any failures after about 100k runs.

cespare added a commit to cespare/go-tools that referenced this issue Nov 7, 2019
Run with:

(cd ./cmd/staticcheck && go build)
rm -rf ~/.cache/staticcheck
./cmd/staticcheck/staticcheck -checks U1000 ./pattern

This won't always trigger all 3 false positives, but it usually triggers
at least some.
@cespare
Copy link
Author

cespare commented Nov 7, 2019

Took me way too long to figure out, but I understand this bug now.

First, here's a fairly reliable repro:

cespare@356eff0

(After checking that out, you can follow the directions in the commit message.)

The problem is that the objNodes sync.Map in unused is accessed in a racy way here:

go-tools/unused/unused.go

Lines 870 to 885 in f51cd49

if node, ok := g.Nodes.Load(obj); ok {
return node.(*Node), false
}
if obj, ok := obj.(types.Object); ok {
key := objNodeKeyFor(g.fset, obj)
if o, ok := g.objNodes.Load(key); ok {
onode := o.(*Node)
return onode, false
}
node = g.newNode(ctx, obj)
g.Nodes.Store(obj, node)
g.objNodes.Store(key, node)
return node, true
}

The problem happens something like this (assuming we have mypkg with a function xxx that is used only in the test):

  • Though we're running on a single package, packages.Load gives us 3 packages: the package under test, the testing variant of the package (including test files), and the synthetic testing mypkg.test package. (That last one doesn't matter for our purposes.)
  • The two variants of the package under test are inserted into the unused graph in parallel.
  • When we find the (*types.Func) for xxx (in both packages), it could so happen that we enter (*Graph).node for both packages at the same time.
  • The goroutines race on the load and subsequent store of g.objNodes. That is, they construct the same key, both goroutines load nothing, and then both goroutines store a new node. Both nodes end up in g.Nodes.
  • Now depending on which goroutine lost the store race, one of the duplicate funcs in the graph will appear unused.

In fact, in my testing I found that it's fairly common to have duplicate values in the graph that should have been deduplicated. Most of the time this doesn't trigger a false positive since the value will be used in both packages that raced to add it to the graph. It requires the somewhat unusual condition of having an unexported function in one package that is only used in the testing package.

@dominikh dominikh removed the waiting-for-feedback Waiting for the user to get back to us label Nov 7, 2019
cespare pushed a commit to liftoffio/dominikh-go-tools that referenced this issue Nov 8, 2019
The 'node' function non-atomically loaded an existing node or created
a new one. This could lead to two goroutines each creating new nodes.
This would ultimately lead to false positives when checking a package
and its test variant, because objects wouldn't be deduplicated
correctly.

Instead of using LoadOrStore, we switch away from sync.Map completely
and use coarse locking on normal maps. Using LoadOrStore would require
us to allocate nodes just to throw them away afterwards, at a high
rate. At the same time, using normal maps + a mutex does not have a
measurable performance impact – as determined by benchmarking against
the standard library.

We also remove the context-local typeNodes map. It served as a
lock-free fast path, but benchmarking shows that it has no measurable
performance impact, either.

Thanks to Caleb Spare for uncovering this fairly embarrassing blunder.

Closes dominikhgh-642
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants