Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/go/analysis: facts should be shareable between analyses #29616

Closed
dominikh opened this issue Jan 8, 2019 · 16 comments
Closed

x/tools/go/analysis: facts should be shareable between analyses #29616

dominikh opened this issue Jan 8, 2019 · 16 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@dominikh
Copy link
Member

dominikh commented Jan 8, 2019

go/analysis compares facts to export data in traditional compilation. However, it declares facts as private to an analysis, meaning a single analysis must produce and consume facts, and no other analyses can consume these facts.

I find this very limiting. Many facts are useful to more than one analysis. For example, staticcheck computes these "facts" (albeit in a different framework) for all functions: purity and whether a function
is supposed to terminate. These facts don't serve one specific analysis, but are rather used by many analyses to either find bugs, or eliminate false positives. Similarly, the "printf wrapper" fact by vet
could prove useful to other analyses.

I believe that go/analysis currently offers no workaround for this. The only API for sharing data between analyses are results. Results are intended for sharing information concerning a single package
between analyses. One could imagine (ab)using results for sharing facts (breaking the assumption that results only concern a single package). However, this is not currently feasible. For one, no
function for enumerating all facts known to an analysis exists. Secondly, even if such a function did exist, making all facts available via a result (likely a map) would be a costly
materialization of said facts (which may be cached on disk, and of which only a small subset may be accessed by any analysis).

Furthermore, the current API discourages people from making facts reusable in the first place. I claim that the writers of the printf check haven't thought about making their facts available via results, strictly considering them an implementation detail.

I propose changing the semantics of facts. They should be shared between analyses, and people should be encouraged to write stand-alone fact-finding analyses. People who do not want to make their facts publicly available, or only use them as an artifact in a single analysis, can continue not exporting the types.

I do not believe that expanding the available scope of facts will have a negative effect on performance. Facts can still be shared between address spaces and remain serializable.

/cc @ianthehat @matloob @alandonovan @mvdan @myitcv

@gopherbot gopherbot added this to the Unreleased milestone Jan 8, 2019
@dominikh dominikh added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jan 8, 2019
@alandonovan
Copy link
Contributor

The only API for sharing data between analyses are results. Results are intended for sharing information concerning a single package between analyses. One could imagine (ab)using results for sharing facts (breaking the assumption that results only concern a single package). However, this is not currently feasible. For one, no function for enumerating all facts known to an analysis exists. Secondly, even if such a function did exist, making all facts available via a result (likely a map) would be a costly
materialization of said facts (which may be cached on disk, and of which only a small subset may be accessed by any analysis).

Sharing results derived from facts is not an abuse, it's the intended way to do what you want in the current design, and this was deliberate. Results are the API that one analyzer presents to another. Results are limited to "information about a single package", but that also includes the public API of each package that it directly imports, plus parts of many other packages indirectly accessible through fields and methods. It should be enough to support most type-based analyses.

That there is no way to enumerate facts does seem like a real limitation, so perhaps we should add an iterator for them.

I appreciate and shared your concern about the cost of materializing all facts, but I could not find a way to implement fact serialization and merging that did not materialize all facts, so I had to let it go. At least today, facts are so few and so small that this has yet to be a noticeable cost, though of course that could change as they are adopted.

@dominikh
Copy link
Member Author

dominikh commented Jan 8, 2019

Wouldn't merging only require a list of keys, as opposed to all gob-decoded values?

@alandonovan
Copy link
Contributor

Yes, I suppose it should. But the expensive part of merging, at least today when facts are few but packages are potentially quite large, is enumerating the set of keys (objects) that are part of the API of the current package; see golang.org/x/tools/go/analysis/internal/facts/imports.go.

@dominikh
Copy link
Member Author

dominikh commented Jan 8, 2019

As a point of reference, for staticcheck std, staticcheck computes 733 facts (94 pure functions, 380 stubs, 43 functions that don't terminate, 216 deprecated objects) More facts will be added in the future.

I suppose thousands of facts, most of which are boolean, aren't a lot.

As for having to enumerate the set of keys, however: Why is this necessary? I get why it'll be necessary if we add a way of iterating over all facts. However, lacking this, all that is needed is a file format with individually addressable facts to get and set individual facts on demand?

@dominikh
Copy link
Member Author

@alandonovan Is there any chance of me convincing you that facts should not be private, or should we move forward with adding an API for iterating over facts?

@alandonovan
Copy link
Contributor

Sorry for the delay. I meant to think on this, but got distracted.

Making facts public does not increase the expressiveness of the API: one can accomplish the same things by exposing a result whose implementation relies on facts. So I have yet to hear a compelling reason to expose facts directly, but perhaps I misunderstand what you have in mind as an alternative.

all that is needed is a file format with individually addressable facts to get and set individual facts on demand

Not every driver implementation today can be assumed to use a serial file format. The single/multichecker implementations use an in-memory fact store, for example. Perhaps it would be no big loss if a change to the API forced all drivers to use an indexable file format as an intermediary.

An indexable file format might make random access cheaper for the analysis, especially if it uses only a few facts, but the driver will still need to visit all the facts from package P and each direct import of P in order to emit the resulting fact set for P; see https://github.com/golang/tools/blob/master/go/analysis/internal/facts/facts.go#L209. So I don't see how it makes things asymptotically cheaper.

Could you elaborate on the API and file format you have in mind?

I'm also happy to move forward with an API for iterating over facts. Do you want to sketch an implementation?

@dominikh
Copy link
Member Author

Making facts public does not increase the expressiveness of the API: one can accomplish the same things by exposing a result whose implementation relies on facts. So I have yet to hear a compelling reason to expose facts directly, but perhaps I misunderstand what you have in mind as an alternative.

It would not make the API more expressive, however it would make it simpler to use, and reduce the API surface of analyses. Instead of having to populate a Result, in addition to collecting facts, a user would only need to export the fact type. The decision between public and private would be a single character, much like in Go's type system, instead of having to write a loop and document a result type for each fact that one wishes to make public. Furthermore, if a single check sets multiple facts, the Result type (again likely to be a map) would require a bit field or struct type that wouldn't be necessary if facts were public.

To summarize, I think it should be much simpler to switch between private and public facts.

but the driver will still need to visit all the facts from package P and each direct import of P in order to emit the resulting fact set for P

I completely glossed over the fact that each package's export data would fully describe its dependencies (like in the compiler), hence my confusion as to why all facts had to be collected. Of course we are assuming that every implementation of fact stores follows our idea of self-contained export data. Other implementations could opt to use linked lists, or actual databases (relational or otherwise), in case people want to optimize for random access.

Not every driver implementation today can be assumed to use a serial file format. The single/multichecker implementations use an in-memory fact store, for example. Perhaps it would be no big loss if a change to the API forced all drivers to use an indexable file format as an intermediary.

The specific file format (if any) isn't important, only that it supports certain operations: random access, possibly iteration, a way to remove all facts associated with an object or package (probably a subset of iteration). The in-memory store trivially supports this, as does self-contained export data that is loaded into memory.

I'm also happy to move forward with an API for iterating over facts. Do you want to sketch an implementation?

The specifics of that API depend on the outcome of my push for random access, so let's hold off on that for now.

@matloob
Copy link
Contributor

matloob commented Mar 15, 2019

Hello!

I'm going to try to get a better idea of the pros/cons of this next week.

If I understand correctly, it seems to me that the main "con" are that each of the {package x analysis} is now responsible for propagating all the facts in its transitive set of analysis dependencies for the types in the package's export data and the package itself, and analyses no longer have control over what they propagate downstream.

I'm a little concerned about that, though I'm not sure how much of an issue it can be. But I'd rather lean on the side of being conservative about changing the API.

@matloob
Copy link
Contributor

matloob commented Mar 19, 2019

@dominikh If I understand correctly, the main reason to make facts sharable between analyses is as a convenience: Pass already contains a mapping from objects or facts, so sharing facts between analyses means there wouldn't be a need to create a new result type and map facts to analyses. So facts can be set and read more easily, and exported more easily.

Would you agree with that?

I think we might be able to keep the distinction between Facts and Results while still making it more convenient for users to export facts, maybe by creating a "common" Result type that makes facts available or something else like that. Maybe there wouldn't be a one character change to make facts visible to other analyses, but it could be a couple of lines.

Given that, I'd prefer to err on the side of not "expanding" the analysis API by making facts sharable if we don't have a stronger reason to do so.

One idea suggested by Alan is that if we could see what analyses would look like side-by-side written using the current Fact/Result mechanism, and written with "sharable" facts, we might be able to suggest helpers or improvements to make the Fact/Result mechanism easier to use.

@dominikh
Copy link
Member Author

Would you agree with that?

Yes. This is about convenience, and a specific mental model. Any analysis using my model (shareable facts) can also be implemented in terms of results. This does, however, still require expanding the API to add a missing part: an analysis being able to enumerate all its facts, in order to construct the result type.

And yes, we can absolutely build helpers to make the process of turning facts into results less painful. If that's the way the majority wants to go, I will concede. I just prefer my approach :-)

@matloob
Copy link
Contributor

matloob commented Mar 20, 2019

Yes, we'd have to be able to enumerate all facts to make the result type. We should definitely do that.

I'm not sure which way the majority wants to go. I'd be interested in hearing others' opinions about this. But I agree with Alan that we'd need more evidence to justify sharing facts.

Edit: this might be worth bringing up at the tools sync...

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/174379 mentions this issue: go/analysis: proposed fact iteration API

gopherbot pushed a commit to golang/tools that referenced this issue May 3, 2019
Updates golang/go#29616

Change-Id: Ibaf10526ea35f06b853ad55451a50750c764fab4
Reviewed-on: https://go-review.googlesource.com/c/tools/+/174379
Run-TryBot: Michael Matloob <[email protected]>
Reviewed-by: Daniel Martí <[email protected]>
Reviewed-by: Ian Cottrell <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
@aaronbee
Copy link

My experience report: I'm trying to write an analysis tool that applies an additional check on all Print/Printf-like functions. This check is unique to my organization's code base. The printf analyzer does the work of marking all the functions I would want to validate with the printf.isWrapper fact. Since this is unexported my only option was to copy the code that applies this fact into my tool.

If the printf.isWrapper type were exported then I could just build my tool using the printf analyzer as a Requires and use pass.ImportObjectFact to discover which calls were print wrappers.

I believe https://golang.org/cl/174379 gets me what I need somewhat awkwardly. Presumably I would iterate over all the object facts, and take action when the fact implements fmt.Stringer and Fact.String() returns "printfWrapper" or "printWrapper" (this is what isWrapper.String() returns).

@alandonovan
Copy link
Contributor

According to the design, the intended way to reuse the results computed by the printf analyzer is not to depend on its facts, but to make it return a result that is an opaque object with an IsWrapper(f *types.Func) method. The implementation of that method would test whether f is a member of a set that the analyzer populates at the same moment that it exports a fact about f:

	// printf.go:262
	fn := w.obj
	var fact isWrapper
	if !pass.ImportObjectFact(fn, &fact) {
		fact.Printf = kind == kindPrintf
		pass.ExportObjectFact(fn, &fact) // here
		for _, caller := range w.callers {
			checkPrintfFwd(pass, caller.w, caller.call, kind)
		}

Facts are how one printf analysis serially communicates with another. Results are how a printf analysis exposes its deductions to other analyzers. It may seem redundant the same information is in both, but that's just this case; in general the structures of the two pieces of information may be quite different.

aaronbee added a commit to aaronbee/tools that referenced this issue Jul 15, 2019
printf.Result has IsPrint function telling the caller if a function is
a Print/Printf function or a wrapper of one.

This aids in developing Ananlyzer's applying checks on Print/Printf
functions.

Implements @alandonovan suggestion from
golang/go#29616 (comment)

Change-Id: I28219337235acb025c50609bed9e1709ad68f5f1
aaronbee added a commit to aaronbee/tools that referenced this issue Jul 15, 2019
printf.Result has IsPrint function telling the caller if a function is
a Print/Printf function or a wrapper of one.

This aids in developing Ananlyzer's applying checks on Print/Printf
functions.

Implements @alandonovan suggestion from
golang/go#29616 (comment)

Change-Id: I57833d5c65c417f2e4a761d9187fc133937f39fb
aaronbee added a commit to aaronbee/tools that referenced this issue Jul 16, 2019
printf.Result has IsPrint function telling the caller if a function is
a Print/Printf function or a wrapper of one.

This aids in developing Ananlyzer's applying checks on Print/Printf
functions.

Implements @alandonovan suggestion from
golang/go#29616 (comment)

Change-Id: I57833d5c65c417f2e4a761d9187fc133937f39fb
aaronbee added a commit to aaronbee/tools that referenced this issue Jul 16, 2019
printf.Result allows clients to learn if a function is a Print/Printf
function or a wrapper of one.

This aids in developing Ananlyzer's applying checks on Print/Printf
functions.

Implements @alandonovan suggestion from
golang/go#29616 (comment)

Change-Id: I57833d5c65c417f2e4a761d9187fc133937f39fb
aaronbee added a commit to aaronbee/tools that referenced this issue Jul 17, 2019
printf.Result allows clients to learn if a function is a Print/Printf
function or a wrapper of one.

This aids in developing Ananlyzer's applying checks on Print/Printf
functions.

Implements @alandonovan suggestion from
golang/go#29616 (comment)

Change-Id: I57833d5c65c417f2e4a761d9187fc133937f39fb
@dominikh
Copy link
Member Author

Are there any objections to closing this issue? I am happy with the current solution (which was the addition of AllObjectFacts) and have adjusted my mental model to match Alan's.

@matloob
Copy link
Contributor

matloob commented Aug 13, 2019

Nope, I'll close! Thanks!

@matloob matloob closed this as completed Aug 13, 2019
aaronbee added a commit to aaronbee/tools that referenced this issue Aug 31, 2019
printf.Result allows clients to learn if a function is a Print/Printf
function or a wrapper of one.

This aids in developing Ananlyzer's applying checks on Print/Printf
functions.

Implements @alandonovan suggestion from
golang/go#29616 (comment)

Change-Id: I57833d5c65c417f2e4a761d9187fc133937f39fb
aaronbee added a commit to aaronbee/tools that referenced this issue Aug 31, 2019
printf.Result allows clients to learn if a function is a Print/Printf
function or a wrapper of one.

This aids in developing Ananlyzer's applying checks on Print/Printf
functions.

Implements @alandonovan suggestion from
golang/go#29616 (comment)

Change-Id: I57833d5c65c417f2e4a761d9187fc133937f39fb
aaronbee added a commit to aaronbee/tools that referenced this issue Sep 3, 2019
printf.Result allows clients to learn if a function is a Print/Printf
function or a wrapper of one.

This aids in developing Ananlyzer's applying checks on Print/Printf
functions.

Implements @alandonovan suggestion from
golang/go#29616 (comment)

Change-Id: I57833d5c65c417f2e4a761d9187fc133937f39fb
gopherbot pushed a commit to golang/tools that referenced this issue Oct 28, 2019
printf.Result has IsPrint function telling the caller if a function is
a Print/Printf function or a wrapper of one.

This aids in developing Ananlyzer's applying checks on Print/Printf
functions.

Implements @alandonovan suggestion from
golang/go#29616 (comment)

Change-Id: I203d51f1fcab7d8574d9309c22b404f8e3de43db
GitHub-Last-Rev: 5cb9115
GitHub-Pull-Request: #138
Reviewed-on: https://go-review.googlesource.com/c/tools/+/186317
Reviewed-by: Michael Matloob <[email protected]>
@golang golang locked and limited conversation to collaborators Aug 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

5 participants