-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go_repository: read configuration from WORKSPACE #529
Comments
@jayconrod Any advice on a generic way to discover the main WORKSPACE file from a |
I'm not sure what the best way to do this will be. I ran a quick test with a dummy repository rule, and the only environment variable that points to the main workspace was |
I have a workingish patch but it has some issues. Note to self: I owe one partially-broken example, get that uploaded tomorrow. |
Okay, so here is what I've been using for the past few months: https://github.com/bazelbuild/bazel-gazelle/compare/master...asuffield:go-repository-workspace?expand=1 It has some flaws. Whenever WORKSPACE is touched for any reason, all go_repository rules will rerun. If rules are defined in a file other than WORKSPACE, this approach cannot work. I think we need a bazel feature to make this really clean, but I haven't figured out precisely what. It'll be something like "expose the parsed repository rules programmatically", but that's not possible today because of the way WORKSPACE parsing works. I had an old email thread with Laurent about a way out of this.... |
Nice! I didn't know that
4f524f2 adds the |
Re-running |
To be clear, I'm not sure what the overhead will be for re-evaluating all the rules; even without needing to download source it might not be acceptable. That said, the original purpose of the cache was to be able to change the Gazelle version or |
Declaring files referenced via repository_macro would require that data being available to the repository rule, which is a nest of complexity (doable, but tough). Rebuilding all go_repository rules has seemed to be tolerable if you have sha256 entries for every download, and frustrating if you have to redownload. I don't have a good solution for git repos (I'm doing them via http instead). The correct behaviour here would be to rebuild them if and only if some repo names or import paths have changed in WORKSPACE, which is a level of subtlety not currently possible without some new bazel features.
I'll just note that this is true in general: repo rules are not hermetic and change behaviour when things on the network give different results (canonical example: somebody moves a git tag that your rule references). Still, we would like to be cache-correct against local files. |
Another option we have here is to use a copy of the main Using |
Oh, how about if we just make go.mod authoritative and use that directly to
generate the repo rules?
…On Fri, 14 Jun 2019, 18:33 Brandon Lico, ***@***.***> wrote:
Another option we have here is to use a copy of the main go.mod file
within every go repository that uses modules. With this approach calls to go
list will use the proper source of truth, which should fix the related
issues. I tested this approach in our repo, and it worked.
Using go.mod instead of WORKSPACE should make it so we only need to
re-evaluate go_repository rules when it is actually necessary (the go.mod
has changed). The downsides of this approach are we may run into weird
issues if WORKSPACE and go.mod are out of sync. Also, we will still have
many calls to go list which can possibly go out to the network.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#529>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJ5POABXFDV4O5SV2Q5ESLP2PI67ANCNFSM4HNYA5KA>
.
|
The problem of unnecessary rebuilds still exists if we depend on the Why don't we just add an option to pass in all of the known imports to every I will try to code this up when I get a chance and post it here. |
I did a quick experiment with I split 22.5s doesn't seem totally unreasonable to me for a workspace change in a medium sized project. I think About WORKSPACE vs go.mod as the source of truth: sorry, it's got to be WORKSPACE, since it can contain directives and customizations on top of what's imported from go.mod. If the delay is too much of a problem, we can provide a way for users to point to a .bzl configuration file instead of WORKSPACE (for example, one that contains a macro with all the I'll try and hack this together today. |
'gazelle fix' and 'gazelle update' now accept -repo_config, the path to a file where information about repositories can be loaded. By default, this is WORKSPACE in the repository root directory. 'gazelle fix' and 'gazelle update-repos' still update the WORKSPACE file in the repository root directory when this flag is set. go_repository passes the path to @//:WORKSPACE to -repo_config. go_repository resolves @//:WORKSPACE and any files mentioned in '# gazelle:repository_macro' directives. When these files, all go_repository rules will be invalidated. It should not be necessary to download cached repositories (except vcs repositories; see bazel-contrib#549). On a Macbook Pro, it takes about 22.5s to re-evaluate 70 cached, invalidated go_repository rules for github.com/gohugoio/hugo. If this becomes a project for large projects, we can provide a way to disable or limit this behavior in the future. go_repository_tools and go_repository_cache are moved to their own .bzl files. Changes in go_repository.bzl should not invalidate these in the future. Fixes bazel-contrib#529
'gazelle fix' and 'gazelle update' now accept -repo_config, the path to a file where information about repositories can be loaded. By default, this is WORKSPACE in the repository root directory. 'gazelle fix' and 'gazelle update-repos' still update the WORKSPACE file in the repository root directory when this flag is set. go_repository passes the path to @//:WORKSPACE to -repo_config. go_repository resolves @//:WORKSPACE and any files mentioned in '# gazelle:repository_macro' directives. When these files, all go_repository rules will be invalidated. It should not be necessary to download cached repositories (except vcs repositories; see #549). On a Macbook Pro, it takes about 22.5s to re-evaluate 70 cached, invalidated go_repository rules for github.com/gohugoio/hugo. If this becomes a project for large projects, we can provide a way to disable or limit this behavior in the future. go_repository_tools and go_repository_cache are moved to their own .bzl files. Changes in go_repository.bzl should not invalidate these in the future. Fixes #529
My main concern with this change is the implications it has on our CI builds. Our CI runs a changed target calculation to figure out what it should build and test. So now a WORKSPACE change will mean changes to every So far the best I optimization I can think of is if instead, we pass in all of the known imports to every I am hoping there is a better solution available here because as it is right now I don't think we can downstream. |
Could you elaborate on how this works? I'm expecting most people run something like
This doesn't seem like a good user experience. WORKSPACE would be O(n^2) in size with the number of Also, I expect there will be other reasons why WORKSPACE configuration will be necessary. #132 would still be useful for declaring a repository name and import path for repositories fetched with
I think this can be tweaked a little bit to point to a file other than |
We run a Bazel query twice (
During
The main thing we need here is to eliminate the number of unnecessary re-builds. Pointing to a .bzl file with I still think we could do better... a |
This seems like it should be fine, but I might be missing something. You're hashing the output of
This seems like the most promising approach to me. I'll implement this soon when I have some spare bandwidth. I still think WORKSPACE will be the best option for most folks in the common case, but such a file could be generated from WORKSPACE with a small local modification or a script. |
You are right... I was assuming the query output would list Thanks for your help here, Jay. |
Okay, this is almost usable for me now. My current bugbear is that I have to list all the repos in the root repository somewhere. In particular, if I'm calling a deps() function from some other repo, there's no way to tell gazelle to go scrape that function. repository_macro appears to be single-use, and also can't reference other repositories. This can probably be solved by letting me write things like:
|
@asuffield You should be able to use It will only be able to reference files within the main workspace though. Referencing files in other workspaces is hard, since they may or may not have been fetched and may be out of date. I'm not sure whether it's technically possible for Gazelle to recursively invoke Bazel to fetch those repositories. It used to be forbidden, but now I think Bazel releases the lock on the output directory before it starts running a command. In any case, it would add a lot of complexity and slow things down, so I'd rather avoid doing that. #132 may be an adequate workaround for this if it were implemented. |
Closing since this was fixed as part of #564. |
When
go_repository
runs Gazelle in an external repository, Gazelle should configure itself using the WORKSPACE file in the main repository.This would be primarily useful for learning names and import paths of other external repositories. It would reduce the need to go out to the network to find external repository root paths. This process is not only slow but error-prone in module mode. Module boundaries may shift across versions, and it's important to know the version that's actually being used. This is the root cause of #525.
If #12 is implemented, we'd probably use this as a starting point to discover other external repositories.
The text was updated successfully, but these errors were encountered: