-
-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unused: handle reflect's Method and MethodByName #476
Labels
Comments
dominikh
added a commit
that referenced
this issue
May 4, 2020
This commit completely replaces the analysis runner of Staticcheck. It fixes several performance shortcomings, as well as subtle bugs in U1000. To explain the behaviors of the old and new runners, assume that we're processing a package graph that looks like this: A ↙ ↘ B C ↓ ⋮ ↓ X Package A is the package we wish to check. Packages B and C are direct dependencies of A, and X is an indirect dependency of B, with potentially many packages between B and X In the old runner, we would process the graph in a single DFS pass. We would start processing A, see that it needed B and C, start loading B and C, and so forth. This approach would unnecessarily increase memory usage. Package C would be held in memory, ready to be used by A, while the long chain from X to B was being processed. Furthermore, A may not need most of C's data in the first place, if A was already fully cached. Furthermore, processing the graph top to bottom is harder to parallelize efficiently. The new runner, in contrast, first materializes the graph (the planning phase) and then executes it from the bottom up (the execution phase). Whenever a leaf node finishes execution, its data would be cached on disk, then unloaded from memory. The only data that will be kept in memory is the package's hash, so that its dependents can compute their own hashes. Next, all dependents that are ready to run (i.e. that have no more unprocessed leaf nodes) will be executed. If the dependent decides that it needs information of its dependencies, it loads them from disk again. This approach drastically reduces peak memory usage, at a slight increase in CPU usage because of repeated loading of data. However, knowing the full graph allows for more efficient parallelization, offsetting the increased CPU cost. It also favours the common case, where most packages will have up to date cached data. Changes to unused The 'unused' check (U1000 and U1001) has always been the odd one out. It is the only check that propagates information backwards in the import graph – that is, the sum of importees determines which objects in a package are considered used. Due to tests and test variants, this applies even when not operating in whole-program mode. The way we implemented this was not only expensive – whole-program mode in particular needed to retain type information for all packages – it was also subtly wrong. Because we cached all diagnostics of a package, we cached stale 'unused' diagnostics when an importee changed. As part of writing the new analysis runner, we make several changes to 'unused' that make sure it behaves well and doesn't negate the performance improvements of the new runner. The most obvious change is the removal of whole-program mode. The combination of correct caching and efficient cache usage means that we no longer have access to the information required to compute a whole-program solution. It never worked quite right, anyway, being unaware of reflection, and having to grossly over-estimate the set of used methods due to interfaces. The normal mode of 'unused' now considers all exported package-level identifiers as used, even if they are declared within tests or package main. Treating exported functions in package main unused has been wrong ever since the addition of the 'plugin' build mode. Doing so in tests may have been mostly correct (ignoring reflection), but continuing to do so would complicate the implementation for little gain. In the new implementation, the per-package information that is cached for U1000 consists of two lists: the list of used objects and the list of unused objects. At the end of analysis, the lists of all packages get merged: if any package uses an object, it is considered used. Otherwise, if any package didn't use an object, it is considered unused. This list-based approach is only correct if the usedness of an exported object in one package doesn't depend on another package. Consider the following package layout: foo.go: package pkg func unexported() {} export_test.go package pkg func Exported() { unexported() } external_test.go package pkg_test import "pkg" var _ = pkg.Exported This layout has three packages: pkg, pkg [test] and pkg_test. Under unused's old logic, pkg_test would be responsible for marking pkg [test]'s Exported as used. This would transitively mark 'unexported' as used, too. However, with our list-based approach, we would get the following lists: pkg: used: unused: unexported pkg [test]: used: unused: unexported, Exported pkg_test: used: Exported unused: Merging these lists, we would never know that 'unexported' was used. Instead of using these lists, we would need to cache and resolve full graphs. This problem does not exist for unexported objects. If a package is able to use an unexported object, it must exist within the same package, which means it can internally resolve the package's graph before generating the lists. For completeness, these are the correct lists: pkg: used: unused: unexported pkg [test]: used: Exported, unexported unused: pkg_test: used: Exported unused: (The inclusion of Exported in pkg_test is superfluous and may be optimized away at some point.) Closes gh-233 Closes gh-284 Closes gh-476 Closes gh-538 Closes gh-576 Closes gh-671 Closes gh-690 Closes gh-691
dominikh
added a commit
that referenced
this issue
May 7, 2020
This commit completely replaces the analysis runner of Staticcheck. It fixes several performance shortcomings, as well as subtle bugs in U1000. To explain the behaviors of the old and new runners, assume that we're processing a package graph that looks like this: A ↙ ↘ B C ↓ ⋮ ↓ X Package A is the package we wish to check. Packages B and C are direct dependencies of A, and X is an indirect dependency of B, with potentially many packages between B and X In the old runner, we would process the graph in a single DFS pass. We would start processing A, see that it needed B and C, start loading B and C, and so forth. This approach would unnecessarily increase memory usage. Package C would be held in memory, ready to be used by A, while the long chain from X to B was being processed. Furthermore, A may not need most of C's data in the first place, if A was already fully cached. Furthermore, processing the graph top to bottom is harder to parallelize efficiently. The new runner, in contrast, first materializes the graph (the planning phase) and then executes it from the bottom up (the execution phase). Whenever a leaf node finishes execution, its data would be cached on disk, then unloaded from memory. The only data that will be kept in memory is the package's hash, so that its dependents can compute their own hashes. Next, all dependents that are ready to run (i.e. that have no more unprocessed leaf nodes) will be executed. If the dependent decides that it needs information of its dependencies, it loads them from disk again. This approach drastically reduces peak memory usage, at a slight increase in CPU usage because of repeated loading of data. However, knowing the full graph allows for more efficient parallelization, offsetting the increased CPU cost. It also favours the common case, where most packages will have up to date cached data. Changes to unused The 'unused' check (U1000 and U1001) has always been the odd one out. It is the only check that propagates information backwards in the import graph – that is, the sum of importees determines which objects in a package are considered used. Due to tests and test variants, this applies even when not operating in whole-program mode. The way we implemented this was not only expensive – whole-program mode in particular needed to retain type information for all packages – it was also subtly wrong. Because we cached all diagnostics of a package, we cached stale 'unused' diagnostics when an importee changed. As part of writing the new analysis runner, we make several changes to 'unused' that make sure it behaves well and doesn't negate the performance improvements of the new runner. The most obvious change is the removal of whole-program mode. The combination of correct caching and efficient cache usage means that we no longer have access to the information required to compute a whole-program solution. It never worked quite right, anyway, being unaware of reflection, and having to grossly over-estimate the set of used methods due to interfaces. The normal mode of 'unused' now considers all exported package-level identifiers as used, even if they are declared within tests or package main. Treating exported functions in package main unused has been wrong ever since the addition of the 'plugin' build mode. Doing so in tests may have been mostly correct (ignoring reflection), but continuing to do so would complicate the implementation for little gain. In the new implementation, the per-package information that is cached for U1000 consists of two lists: the list of used objects and the list of unused objects. At the end of analysis, the lists of all packages get merged: if any package uses an object, it is considered used. Otherwise, if any package didn't use an object, it is considered unused. This list-based approach is only correct if the usedness of an exported object in one package doesn't depend on another package. Consider the following package layout: foo.go: package pkg func unexported() {} export_test.go package pkg func Exported() { unexported() } external_test.go package pkg_test import "pkg" var _ = pkg.Exported This layout has three packages: pkg, pkg [test] and pkg_test. Under unused's old logic, pkg_test would be responsible for marking pkg [test]'s Exported as used. This would transitively mark 'unexported' as used, too. However, with our list-based approach, we would get the following lists: pkg: used: unused: unexported pkg [test]: used: unused: unexported, Exported pkg_test: used: Exported unused: Merging these lists, we would never know that 'unexported' was used. Instead of using these lists, we would need to cache and resolve full graphs. This problem does not exist for unexported objects. If a package is able to use an unexported object, it must exist within the same package, which means it can internally resolve the package's graph before generating the lists. For completeness, these are the correct lists: pkg: used: unused: unexported pkg [test]: used: Exported, unexported unused: pkg_test: used: Exported unused: (The inclusion of Exported in pkg_test is superfluous and may be optimized away at some point.) Closes gh-233 Closes gh-284 Closes gh-476 Closes gh-538 Closes gh-576 Closes gh-671 Closes gh-690 Closes gh-691
dominikh
added a commit
that referenced
this issue
May 7, 2020
This commit completely replaces the analysis runner of Staticcheck. It fixes several performance shortcomings, as well as subtle bugs in U1000. To explain the behaviors of the old and new runners, assume that we're processing a package graph that looks like this: A ↙ ↘ B C ↓ ⋮ ↓ X Package A is the package we wish to check. Packages B and C are direct dependencies of A, and X is an indirect dependency of B, with potentially many packages between B and X In the old runner, we would process the graph in a single DFS pass. We would start processing A, see that it needed B and C, start loading B and C, and so forth. This approach would unnecessarily increase memory usage. Package C would be held in memory, ready to be used by A, while the long chain from X to B was being processed. Furthermore, A may not need most of C's data in the first place, if A was already fully cached. Furthermore, processing the graph top to bottom is harder to parallelize efficiently. The new runner, in contrast, first materializes the graph (the planning phase) and then executes it from the bottom up (the execution phase). Whenever a leaf node finishes execution, its data would be cached on disk, then unloaded from memory. The only data that will be kept in memory is the package's hash, so that its dependents can compute their own hashes. Next, all dependents that are ready to run (i.e. that have no more unprocessed leaf nodes) will be executed. If the dependent decides that it needs information of its dependencies, it loads them from disk again. This approach drastically reduces peak memory usage, at a slight increase in CPU usage because of repeated loading of data. However, knowing the full graph allows for more efficient parallelization, offsetting the increased CPU cost. It also favours the common case, where most packages will have up to date cached data. Changes to unused The 'unused' check (U1000 and U1001) has always been the odd one out. It is the only check that propagates information backwards in the import graph – that is, the sum of importees determines which objects in a package are considered used. Due to tests and test variants, this applies even when not operating in whole-program mode. The way we implemented this was not only expensive – whole-program mode in particular needed to retain type information for all packages – it was also subtly wrong. Because we cached all diagnostics of a package, we cached stale 'unused' diagnostics when an importee changed. As part of writing the new analysis runner, we make several changes to 'unused' that make sure it behaves well and doesn't negate the performance improvements of the new runner. The most obvious change is the removal of whole-program mode. The combination of correct caching and efficient cache usage means that we no longer have access to the information required to compute a whole-program solution. It never worked quite right, anyway, being unaware of reflection, and having to grossly over-estimate the set of used methods due to interfaces. The normal mode of 'unused' now considers all exported package-level identifiers as used, even if they are declared within tests or package main. Treating exported functions in package main unused has been wrong ever since the addition of the 'plugin' build mode. Doing so in tests may have been mostly correct (ignoring reflection), but continuing to do so would complicate the implementation for little gain. In the new implementation, the per-package information that is cached for U1000 consists of two lists: the list of used objects and the list of unused objects. At the end of analysis, the lists of all packages get merged: if any package uses an object, it is considered used. Otherwise, if any package didn't use an object, it is considered unused. This list-based approach is only correct if the usedness of an exported object in one package doesn't depend on another package. Consider the following package layout: foo.go: package pkg func unexported() {} export_test.go package pkg func Exported() { unexported() } external_test.go package pkg_test import "pkg" var _ = pkg.Exported This layout has three packages: pkg, pkg [test] and pkg_test. Under unused's old logic, pkg_test would be responsible for marking pkg [test]'s Exported as used. This would transitively mark 'unexported' as used, too. However, with our list-based approach, we would get the following lists: pkg: used: unused: unexported pkg [test]: used: unused: unexported, Exported pkg_test: used: Exported unused: Merging these lists, we would never know that 'unexported' was used. Instead of using these lists, we would need to cache and resolve full graphs. This problem does not exist for unexported objects. If a package is able to use an unexported object, it must exist within the same package, which means it can internally resolve the package's graph before generating the lists. For completeness, these are the correct lists: pkg: used: unused: unexported pkg [test]: used: Exported, unexported unused: pkg_test: used: Exported unused: (The inclusion of Exported in pkg_test is superfluous and may be optimized away at some point.) As part of porting unused's tests, we discovered a flaky false negative, caused by an incorrect implementation of our version of types.Identical. We were still using types.Identical under the hood, which wouldn't correctly account for nested types. This has been fixed. Closes gh-233 Closes gh-284 Closes gh-476 Closes gh-538 Closes gh-576 Closes gh-671 Closes gh-690 Closes gh-691
dominikh
added a commit
that referenced
this issue
May 8, 2020
This commit completely replaces the analysis runner of Staticcheck. It fixes several performance shortcomings, as well as subtle bugs in U1000. To explain the behaviors of the old and new runners, assume that we're processing a package graph that looks like this: A ↙ ↘ B C ↓ ⋮ ↓ X Package A is the package we wish to check. Packages B and C are direct dependencies of A, and X is an indirect dependency of B, with potentially many packages between B and X In the old runner, we would process the graph in a single DFS pass. We would start processing A, see that it needed B and C, start loading B and C, and so forth. This approach would unnecessarily increase memory usage. Package C would be held in memory, ready to be used by A, while the long chain from X to B was being processed. Furthermore, A may not need most of C's data in the first place, if A was already fully cached. Furthermore, processing the graph top to bottom is harder to parallelize efficiently. The new runner, in contrast, first materializes the graph (the planning phase) and then executes it from the bottom up (the execution phase). Whenever a leaf node finishes execution, its data would be cached on disk, then unloaded from memory. The only data that will be kept in memory is the package's hash, so that its dependents can compute their own hashes. Next, all dependents that are ready to run (i.e. that have no more unprocessed leaf nodes) will be executed. If the dependent decides that it needs information of its dependencies, it loads them from disk again. This approach drastically reduces peak memory usage, at a slight increase in CPU usage because of repeated loading of data. However, knowing the full graph allows for more efficient parallelization, offsetting the increased CPU cost. It also favours the common case, where most packages will have up to date cached data. Changes to unused The 'unused' check (U1000 and U1001) has always been the odd one out. It is the only check that propagates information backwards in the import graph – that is, the sum of importees determines which objects in a package are considered used. Due to tests and test variants, this applies even when not operating in whole-program mode. The way we implemented this was not only expensive – whole-program mode in particular needed to retain type information for all packages – it was also subtly wrong. Because we cached all diagnostics of a package, we cached stale 'unused' diagnostics when an importee changed. As part of writing the new analysis runner, we make several changes to 'unused' that make sure it behaves well and doesn't negate the performance improvements of the new runner. The most obvious change is the removal of whole-program mode. The combination of correct caching and efficient cache usage means that we no longer have access to the information required to compute a whole-program solution. It never worked quite right, anyway, being unaware of reflection, and having to grossly over-estimate the set of used methods due to interfaces. The normal mode of 'unused' now considers all exported package-level identifiers as used, even if they are declared within tests or package main. Treating exported functions in package main unused has been wrong ever since the addition of the 'plugin' build mode. Doing so in tests may have been mostly correct (ignoring reflection), but continuing to do so would complicate the implementation for little gain. In the new implementation, the per-package information that is cached for U1000 consists of two lists: the list of used objects and the list of unused objects. At the end of analysis, the lists of all packages get merged: if any package uses an object, it is considered used. Otherwise, if any package didn't use an object, it is considered unused. This list-based approach is only correct if the usedness of an exported object in one package doesn't depend on another package. Consider the following package layout: foo.go: package pkg func unexported() {} export_test.go package pkg func Exported() { unexported() } external_test.go package pkg_test import "pkg" var _ = pkg.Exported This layout has three packages: pkg, pkg [test] and pkg_test. Under unused's old logic, pkg_test would be responsible for marking pkg [test]'s Exported as used. This would transitively mark 'unexported' as used, too. However, with our list-based approach, we would get the following lists: pkg: used: unused: unexported pkg [test]: used: unused: unexported, Exported pkg_test: used: Exported unused: Merging these lists, we would never know that 'unexported' was used. Instead of using these lists, we would need to cache and resolve full graphs. This problem does not exist for unexported objects. If a package is able to use an unexported object, it must exist within the same package, which means it can internally resolve the package's graph before generating the lists. For completeness, these are the correct lists: pkg: used: unused: unexported pkg [test]: used: Exported, unexported unused: pkg_test: used: Exported unused: (The inclusion of Exported in pkg_test is superfluous and may be optimized away at some point.) As part of porting unused's tests, we discovered a flaky false negative, caused by an incorrect implementation of our version of types.Identical. We were still using types.Identical under the hood, which wouldn't correctly account for nested types. This has been fixed. More changes to unused Several planned improvements to 'unused' also made it easier to integrate with the new runner, which is why these changes are part of this commit. TODO Closes gh-233 Closes gh-284 Closes gh-476 Closes gh-538 Closes gh-576 Closes gh-671 Closes gh-675 Closes gh-690 Closes gh-691
dominikh
added a commit
that referenced
this issue
May 8, 2020
This commit completely replaces the analysis runner of Staticcheck. It fixes several performance shortcomings, as well as subtle bugs in U1000. To explain the behaviors of the old and new runners, assume that we're processing a package graph that looks like this: A ↙ ↘ B C ↓ ⋮ ↓ X Package A is the package we wish to check. Packages B and C are direct dependencies of A, and X is an indirect dependency of B, with potentially many packages between B and X In the old runner, we would process the graph in a single DFS pass. We would start processing A, see that it needed B and C, start loading B and C, and so forth. This approach would unnecessarily increase memory usage. Package C would be held in memory, ready to be used by A, while the long chain from X to B was being processed. Furthermore, A may not need most of C's data in the first place, if A was already fully cached. Furthermore, processing the graph top to bottom is harder to parallelize efficiently. The new runner, in contrast, first materializes the graph (the planning phase) and then executes it from the bottom up (the execution phase). Whenever a leaf node finishes execution, its data would be cached on disk, then unloaded from memory. The only data that will be kept in memory is the package's hash, so that its dependents can compute their own hashes. Next, all dependents that are ready to run (i.e. that have no more unprocessed leaf nodes) will be executed. If the dependent decides that it needs information of its dependencies, it loads them from disk again. This approach drastically reduces peak memory usage, at a slight increase in CPU usage because of repeated loading of data. However, knowing the full graph allows for more efficient parallelization, offsetting the increased CPU cost. It also favours the common case, where most packages will have up to date cached data. Changes to unused The 'unused' check (U1000 and U1001) has always been the odd one out. It is the only check that propagates information backwards in the import graph – that is, the sum of importees determines which objects in a package are considered used. Due to tests and test variants, this applies even when not operating in whole-program mode. The way we implemented this was not only expensive – whole-program mode in particular needed to retain type information for all packages – it was also subtly wrong. Because we cached all diagnostics of a package, we cached stale 'unused' diagnostics when an importee changed. As part of writing the new analysis runner, we make several changes to 'unused' that make sure it behaves well and doesn't negate the performance improvements of the new runner. The most obvious change is the removal of whole-program mode. The combination of correct caching and efficient cache usage means that we no longer have access to the information required to compute a whole-program solution. It never worked quite right, anyway, being unaware of reflection, and having to grossly over-estimate the set of used methods due to interfaces. The normal mode of 'unused' now considers all exported package-level identifiers as used, even if they are declared within tests or package main. Treating exported functions in package main unused has been wrong ever since the addition of the 'plugin' build mode. Doing so in tests may have been mostly correct (ignoring reflection), but continuing to do so would complicate the implementation for little gain. In the new implementation, the per-package information that is cached for U1000 consists of two lists: the list of used objects and the list of unused objects. At the end of analysis, the lists of all packages get merged: if any package uses an object, it is considered used. Otherwise, if any package didn't use an object, it is considered unused. This list-based approach is only correct if the usedness of an exported object in one package doesn't depend on another package. Consider the following package layout: foo.go: package pkg func unexported() {} export_test.go package pkg func Exported() { unexported() } external_test.go package pkg_test import "pkg" var _ = pkg.Exported This layout has three packages: pkg, pkg [test] and pkg_test. Under unused's old logic, pkg_test would be responsible for marking pkg [test]'s Exported as used. This would transitively mark 'unexported' as used, too. However, with our list-based approach, we would get the following lists: pkg: used: unused: unexported pkg [test]: used: unused: unexported, Exported pkg_test: used: Exported unused: Merging these lists, we would never know that 'unexported' was used. Instead of using these lists, we would need to cache and resolve full graphs. This problem does not exist for unexported objects. If a package is able to use an unexported object, it must exist within the same package, which means it can internally resolve the package's graph before generating the lists. For completeness, these are the correct lists: pkg: used: unused: unexported pkg [test]: used: Exported, unexported unused: pkg_test: used: Exported unused: (The inclusion of Exported in pkg_test is superfluous and may be optimized away at some point.) As part of porting unused's tests, we discovered a flaky false negative, caused by an incorrect implementation of our version of types.Identical. We were still using types.Identical under the hood, which wouldn't correctly account for nested types. This has been fixed. More changes to unused Several planned improvements to 'unused' also made it easier to integrate with the new runner, which is why these changes are part of this commit. TODO Closes gh-233 Closes gh-284 Closes gh-476 Closes gh-538 Closes gh-576 Closes gh-671 Closes gh-675 Closes gh-690 Closes gh-691
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Input:
Current output:
Desired output:
None.
Solution:
There isn't a perfect way to solve this because we can't accurately track the flow of reflect.Value through all code. A good approximation would be matching the exact code pattern from the example. This would reduce a lot of false positives without requiring much new code. The next step up would be tracking values created from reflect.ValueOf inside a single function. This would catch some more false positives and only increase the complexity of the algorithm a bit. The second step up would be inter-procedural analysis, which we're not likely to be doing.
We can also track the arguments to Method and MethodByName at different levels. We will likely restrict ourselves to trivially known constants, and fall back to marking all methods as used if the argument is dynamic. Method needs special care because we have to use the same order of methods as the reflect package. In reality, however, most (all?) uses of Method will be dynamic, anyway, iterating over all methods.
The text was updated successfully, but these errors were encountered: