Skip to content

Commit

Permalink
Implement a new analysis runner
Browse files Browse the repository at this point in the history
This commit completely replaces the analysis runner of Staticcheck. It
fixes several performance shortcomings, as well as subtle bugs in U1000.

To explain the behaviors of the old and new runners, assume that we're
processing a package graph that looks like this:

	  A
	 ↙ ↘
	B   C
	↓
	⋮
	↓
	X

Package A is the package we wish to check. Packages B and C are direct
dependencies of A, and X is an indirect dependency of B, with
potentially many packages between B and X

In the old runner, we would process the graph in a single DFS pass. We
would start processing A, see that it needed B and C, start loading B
and C, and so forth. This approach would unnecessarily increase memory
usage. Package C would be held in memory, ready to be used by A, while
the long chain from X to B was being processed. Furthermore, A may not
need most of C's data in the first place, if A was already fully
cached. Furthermore, processing the graph top to bottom is harder to
parallelize efficiently.

The new runner, in contrast, first materializes the graph (the
planning phase) and then executes it from the bottom up (the execution
phase). Whenever a leaf node finishes execution, its data would be
cached on disk, then unloaded from memory. The only data that will be
kept in memory is the package's hash, so that its dependents can
compute their own hashes.

Next, all dependents that are ready to run (i.e. that have no more
unprocessed leaf nodes) will be executed. If the dependent decides
that it needs information of its dependencies, it loads them from disk
again. This approach drastically reduces peak memory usage, at a
slight increase in CPU usage because of repeated loading of data.
However, knowing the full graph allows for more efficient
parallelization, offsetting the increased CPU cost. It also favours
the common case, where most packages will have up to date cached data.

Changes to unused

The 'unused' check (U1000 and U1001) has always been the odd one out.
It is the only check that propagates information backwards in the
import graph – that is, the sum of importees determines which objects
in a package are considered used. Due to tests and test variants, this
applies even when not operating in whole-program mode.

The way we implemented this was not only expensive – whole-program
mode in particular needed to retain type information for all packages
– it was also subtly wrong. Because we cached all diagnostics of a
package, we cached stale 'unused' diagnostics when an importee
changed.

As part of writing the new analysis runner, we make several changes to
'unused' that make sure it behaves well and doesn't negate the
performance improvements of the new runner.

The most obvious change is the removal of whole-program mode. The
combination of correct caching and efficient cache usage means that we
no longer have access to the information required to compute a
whole-program solution. It never worked quite right, anyway, being
unaware of reflection, and having to grossly over-estimate the set of
used methods due to interfaces.

The normal mode of 'unused' now considers all exported package-level
identifiers as used, even if they are declared within tests or package
main. Treating exported functions in package main unused has been
wrong ever since the addition of the 'plugin' build mode. Doing so in
tests may have been mostly correct (ignoring reflection), but
continuing to do so would complicate the implementation for little
gain.

In the new implementation, the per-package information that is cached
for U1000 consists of two lists: the list of used objects and the list
of unused objects. At the end of analysis, the lists of all packages
get merged: if any package uses an object, it is considered used.
Otherwise, if any package didn't use an object, it is considered
unused.

This list-based approach is only correct if the usedness of an
exported object in one package doesn't depend on another package.

Consider the following package layout:

	foo.go:
	package pkg

	func unexported() {}

	export_test.go
	package pkg

	func Exported() { unexported() }

	external_test.go
	package pkg_test

	import "pkg"

	var _ = pkg.Exported

This layout has three packages: pkg, pkg [test] and pkg_test. Under
unused's old logic, pkg_test would be responsible for marking pkg
[test]'s Exported as used. This would transitively mark 'unexported'
as used, too. However, with our list-based approach, we would get the
following lists:

pkg:
  used:
  unused: unexported

pkg [test]:
  used:
  unused: unexported, Exported

pkg_test:
  used: Exported
  unused:

Merging these lists, we would never know that 'unexported' was used.
Instead of using these lists, we would need to cache and resolve full
graphs.

This problem does not exist for unexported objects. If a package is
able to use an unexported object, it must exist within the same
package, which means it can internally resolve the package's graph
before generating the lists.

For completeness, these are the correct lists:

pkg:
  used:
  unused: unexported

pkg [test]:
  used: Exported, unexported
  unused:

pkg_test:
  used: Exported
  unused:

(The inclusion of Exported in pkg_test is superfluous and may be
optimized away at some point.)

As part of porting unused's tests, we discovered a flaky false
negative, caused by an incorrect implementation of our version of
types.Identical. We were still using types.Identical under the hood,
which wouldn't correctly account for nested types. This has been fixed.

Closes gh-233
Closes gh-284
Closes gh-476
Closes gh-538
Closes gh-576
Closes gh-671
Closes gh-690
Closes gh-691
  • Loading branch information
dominikh committed May 7, 2020
1 parent 009a146 commit d47f0c0
Show file tree
Hide file tree
Showing 69 changed files with 3,553 additions and 2,958 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ jobs:
strategy:
matrix:
os: ["windows-latest", "ubuntu-latest", "macOS-latest"]
go: ["1.12.x", "1.13.x"]
go: ["1.13.x", "1.14.x"]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v1
Expand Down
10 changes: 4 additions & 6 deletions cmd/staticcheck/staticcheck.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ import (
"os"

"golang.org/x/tools/go/analysis"
"honnef.co/go/tools/lint"
"honnef.co/go/tools/lint/lintutil"
"honnef.co/go/tools/simple"
"honnef.co/go/tools/staticcheck"
Expand All @@ -16,7 +15,6 @@ import (

func main() {
fs := lintutil.FlagSet("staticcheck")
wholeProgram := fs.Bool("unused.whole-program", false, "Run unused in whole program mode")
debug := fs.String("debug.unused-graph", "", "Write unused's object graph to `file`")
fs.Parse(os.Args[1:])

Expand All @@ -31,14 +29,14 @@ func main() {
cs = append(cs, v)
}

u := unused.NewChecker(*wholeProgram)
cs = append(cs, unused.Analyzer)
if *debug != "" {
f, err := os.OpenFile(*debug, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0666)
if err != nil {
log.Fatal(err)
}
u.Debug = f
unused.Debug = f
}
cums := []lint.CumulativeChecker{u}
lintutil.ProcessFlagSet(cs, cums, fs)

lintutil.ProcessFlagSet(cs, fs)
}
62 changes: 57 additions & 5 deletions code/code.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@
package code

import (
"bytes"
"flag"
"fmt"
"go/ast"
"go/constant"
"go/token"
"go/types"
"strings"
"sync"

"golang.org/x/tools/go/analysis"
"golang.org/x/tools/go/analysis/passes/inspect"
Expand All @@ -17,9 +19,55 @@ import (
"honnef.co/go/tools/facts"
"honnef.co/go/tools/go/types/typeutil"
"honnef.co/go/tools/ir"
"honnef.co/go/tools/lint"
)

var bufferPool = &sync.Pool{
New: func() interface{} {
buf := bytes.NewBuffer(nil)
buf.Grow(64)
return buf
},
}

func FuncName(f *types.Func) string {
buf := bufferPool.Get().(*bytes.Buffer)
buf.Reset()
if f.Type() != nil {
sig := f.Type().(*types.Signature)
if recv := sig.Recv(); recv != nil {
buf.WriteByte('(')
if _, ok := recv.Type().(*types.Interface); ok {
// gcimporter creates abstract methods of
// named interfaces using the interface type
// (not the named type) as the receiver.
// Don't print it in full.
buf.WriteString("interface")
} else {
types.WriteType(buf, recv.Type(), nil)
}
buf.WriteByte(')')
buf.WriteByte('.')
} else if f.Pkg() != nil {
writePackage(buf, f.Pkg())
}
}
buf.WriteString(f.Name())
s := buf.String()
bufferPool.Put(buf)
return s
}

func writePackage(buf *bytes.Buffer, pkg *types.Package) {
if pkg == nil {
return
}
s := pkg.Path()
if s != "" {
buf.WriteString(s)
buf.WriteByte('.')
}
}

type Positioner interface {
Pos() token.Pos
}
Expand All @@ -34,7 +82,7 @@ func CallName(call *ir.CallCommon) string {
if !ok {
return ""
}
return lint.FuncName(fn)
return FuncName(fn)
case *ir.Builtin:
return v.Name()
}
Expand Down Expand Up @@ -244,12 +292,12 @@ func CallNameAST(pass *analysis.Pass, call *ast.CallExpr) string {
if !ok {
return ""
}
return lint.FuncName(fn)
return FuncName(fn)
case *ast.Ident:
obj := pass.TypesInfo.ObjectOf(fun)
switch obj := obj.(type) {
case *types.Func:
return lint.FuncName(obj)
return FuncName(obj)
case *types.Builtin:
return obj.Name()
default:
Expand Down Expand Up @@ -472,7 +520,11 @@ func MayHaveSideEffects(pass *analysis.Pass, expr ast.Expr, purity facts.PurityR
}

func IsGoVersion(pass *analysis.Pass, minor int) bool {
version := pass.Analyzer.Flags.Lookup("go").Value.(flag.Getter).Get().(int)
f, ok := pass.Analyzer.Flags.Lookup("go").Value.(flag.Getter)
if !ok {
panic("requested Go version, but analyzer has no version flag")
}
version := f.Get().(int)
return version >= minor
}

Expand Down
138 changes: 106 additions & 32 deletions go/types/typeutil/identical.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,80 @@ import (
"go/types"
)

// Identical reports whether x and y are identical types.
// Unlike types.Identical, receivers of Signature types are not ignored.
// Unlike types.Identical, interfaces are compared via pointer equality (except for the empty interface, which gets deduplicated).
// Unlike types.Identical, structs are compared via pointer equality.
func Identical(x, y types.Type) (ret bool) {
if !types.Identical(x, y) {
return false
func identical0(x, y types.Type) bool {
if x == y {
return true
}

switch x := x.(type) {
case *types.Basic:
// Basic types are singletons except for the rune and byte
// aliases, thus we cannot solely rely on the x == y check
// above. See also comment in TypeName.IsAlias.
if y, ok := y.(*types.Basic); ok {
return x.Kind() == y.Kind()
}

case *types.Array:
// Two array types are identical if they have identical element types
// and the same array length.
if y, ok := y.(*types.Array); ok {
// If one or both array lengths are unknown (< 0) due to some error,
// assume they are the same to avoid spurious follow-on errors.
return (x.Len() < 0 || y.Len() < 0 || x.Len() == y.Len()) && identical0(x.Elem(), y.Elem())
}

case *types.Slice:
// Two slice types are identical if they have identical element types.
if y, ok := y.(*types.Slice); ok {
return identical0(x.Elem(), y.Elem())
}

case *types.Struct:
y, ok := y.(*types.Struct)
if !ok {
// should be impossible
return true
if y, ok := y.(*types.Struct); ok {
return x == y
}

case *types.Pointer:
// Two pointer types are identical if they have identical base types.
if y, ok := y.(*types.Pointer); ok {
return identical0(x.Elem(), y.Elem())
}

case *types.Tuple:
// Two tuples types are identical if they have the same number of elements
// and corresponding elements have identical types.
if y, ok := y.(*types.Tuple); ok {
if x.Len() == y.Len() {
if x != nil {
for i := 0; i < x.Len(); i++ {
v := x.At(i)
w := y.At(i)
if !identical0(v.Type(), w.Type()) {
return false
}
}
}
return true
}
}

case *types.Signature:
// Two function types are identical if they have the same number of parameters
// and result values, corresponding parameter and result types are identical,
// and either both functions are variadic or neither is. Parameter and result
// names are not required to match.
if y, ok := y.(*types.Signature); ok {

return x.Variadic() == y.Variadic() &&
identical0(x.Params(), y.Params()) &&
identical0(x.Results(), y.Results()) &&
(x.Recv() != nil && y.Recv() != nil && identical0(x.Recv().Type(), y.Recv().Type()) || x.Recv() == nil && y.Recv() == nil)
}
return x == y

case *types.Interface:
// The issue with interfaces, typeutil.Map and types.Identical
//
Expand All @@ -43,33 +100,50 @@ func Identical(x, y types.Type) (ret bool) {
// pointers. This will obviously miss identical interfaces,
// but this only has a runtime cost, it doesn't affect
// correctness.
y, ok := y.(*types.Interface)
if !ok {
// should be impossible
return true
}
if x.NumEmbeddeds() == 0 &&
y.NumEmbeddeds() == 0 &&
x.NumMethods() == 0 &&
y.NumMethods() == 0 {
// all truly empty interfaces are the same
return true
if y, ok := y.(*types.Interface); ok {
if x.NumEmbeddeds() == 0 &&
y.NumEmbeddeds() == 0 &&
x.NumMethods() == 0 &&
y.NumMethods() == 0 {
// all truly empty interfaces are the same
return true
}
return x == y
}
return x == y
case *types.Signature:
y, ok := y.(*types.Signature)
if !ok {
// should be impossible
return true

case *types.Map:
// Two map types are identical if they have identical key and value types.
if y, ok := y.(*types.Map); ok {
return identical0(x.Key(), y.Key()) && identical0(x.Elem(), y.Elem())
}
if x.Recv() == y.Recv() {
return true

case *types.Chan:
// Two channel types are identical if they have identical value types
// and the same direction.
if y, ok := y.(*types.Chan); ok {
return x.Dir() == y.Dir() && identical0(x.Elem(), y.Elem())
}
if x.Recv() == nil || y.Recv() == nil {
return false

case *types.Named:
// Two named types are identical if their type names originate
// in the same type declaration.
if y, ok := y.(*types.Named); ok {
return x.Obj() == y.Obj()
}
return Identical(x.Recv().Type(), y.Recv().Type())

case nil:

default:
return true
panic("unreachable")
}

return false
}

// Identical reports whether x and y are identical types.
// Unlike types.Identical, receivers of Signature types are not ignored.
// Unlike types.Identical, interfaces are compared via pointer equality (except for the empty interface, which gets deduplicated).
// Unlike types.Identical, structs are compared via pointer equality.
func Identical(x, y types.Type) (ret bool) {
return identical0(x, y)
}
10 changes: 5 additions & 5 deletions internal/cache/cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ func (c *Cache) fileName(id [HashSize]byte, key string) string {
return filepath.Join(c.dir, fmt.Sprintf("%02x", id[0]), fmt.Sprintf("%x", id)+"-"+key)
}

var errMissing = errors.New("cache entry not found")
var ErrMissing = errors.New("cache entry not found")

const (
// action entry file is "v1 <hex id> <hex out> <decimal size space-padded to 20 bytes> <unixnano space-padded to 20 bytes>\n"
Expand Down Expand Up @@ -124,7 +124,7 @@ func initEnv() {
// saved file for that output ID is still available.
func (c *Cache) Get(id ActionID) (Entry, error) {
if verify {
return Entry{}, errMissing
return Entry{}, ErrMissing
}
return c.get(id)
}
Expand All @@ -138,7 +138,7 @@ type Entry struct {
// get is Get but does not respect verify mode, so that Put can use it.
func (c *Cache) get(id ActionID) (Entry, error) {
missing := func() (Entry, error) {
return Entry{}, errMissing
return Entry{}, ErrMissing
}
f, err := os.Open(c.fileName(id, "a"))
if err != nil {
Expand Down Expand Up @@ -196,7 +196,7 @@ func (c *Cache) GetFile(id ActionID) (file string, entry Entry, err error) {
file = c.OutputFile(entry.OutputID)
info, err := os.Stat(file)
if err != nil || info.Size() != entry.Size {
return "", Entry{}, errMissing
return "", Entry{}, ErrMissing
}
return file, entry, nil
}
Expand All @@ -211,7 +211,7 @@ func (c *Cache) GetBytes(id ActionID) ([]byte, Entry, error) {
}
data, _ := ioutil.ReadFile(c.OutputFile(entry.OutputID))
if sha256.Sum256(data) != entry.OutputID {
return nil, entry, errMissing
return nil, entry, ErrMissing
}
return data, entry, nil
}
Expand Down
3 changes: 3 additions & 0 deletions internal/passes/buildir/buildir.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ type willUnwind struct{}
func (*willExit) AFact() {}
func (*willUnwind) AFact() {}

func (*willExit) String() string { return "will exit" }
func (*willUnwind) String() string { return "will unwind" }

var Analyzer = &analysis.Analyzer{
Name: "buildir",
Doc: "build IR for later passes",
Expand Down
Loading

0 comments on commit d47f0c0

Please sign in to comment.