-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: --show-rulekey #7962
Comments
To give a little more context, this feature would be extremely useful and will help to produce a list of changed targets between two commits on CI allowing correctly scoped builds (instead of //...) and also allowing other pre-land optimizations. As previous discussion suggests there is no easy way to detect changes in the build graph when it comes to modifications in starlark files (WORKSPACE, BUILD, *.bzl, etc) For example I've built reasonably working prototype that:
Although it works fast and requires only one bazel query it may lead to possible false negatives:
We've also considered another approach, which relies on query of rbuildfiles for each changed starlark file but what makes it not acceptable is that any change to the WORKSPACE file would trigger pretty much full repo rebuild as all third-party dependencies are defined there. In addition to false positives it also has scalability issues as it needs to issues a separate query for each changed starlark file which runs ~0.5 sec, meaning that reasonably sized refactoring affecting 1000s of build files can be computing changed targets for 10s of minutes. Are there any reasons preventing us from adding this feature into bazel core and printing checksum next to each rule in the dependency graph? |
I agree imo that should be part of bazel core - @vitarb is there any chance you can share the prototype you have mentioned? As you may have seen we are interested in a performant solution to this and apart from including this into bazel core itself (which I hope will happen at some point) this imo seems like the most general version and least prone to errors, even if it still has some edge cases. |
@Globegitter, let me refine it a little bit and I will try to share something. On the other hand downsides are:
Since most of these issues can be either mitigated by proper cashing or other improvements it seems to be a pretty good approach overall and can lead to a concise and VCS independent solution. At the same time having bazel do same would probably be more efficient. |
Bouncing over to the Core people. However, given that rule keys are not how Bazel works (analysis phase caching relies on Java object identity alone and it's only actions that have hashes, which are a function of the contents of their input files), I wouldn't hold my breath. |
@vitarb You don't happen to have anything that you can share? I've been looking for something that solves this problem but I've found no concrete solutions. Pants has a flag |
@lberki Would it be possible to get someone from the core team to comment on this issue? I'm wondering if there are some major technical hurdles that need to be solved to be able to add a digest for each target in either
|
/cc @ericfelly I would love this to happen, but it requires a lot of starts to align |
It's sounds like this is a general request for finding the affected targets given a change to the repo. Is that right? Is the rule key stuff an actual requirement, or just one possible approach? |
to elaborate on @lberki 's response, buck's https://buck.build/concept/rule_keys.html are transitive but bazel's keys are non-transitive. the usefulness of transitive digests is, i presume, that you'd have this useful property for deciding when to rebuild test targets:
yes, that's my assessment too. |
@ericfelly @haxorz In my case it's to figure out what artifacts changed between two commits so that we can deploy only changed artifacts. So the rule key is just one way to do this, but I think there are multiple ways to achieve this. I'm not familiar enough with Bazel internals to find the most idiomatic way to achieve this within bazel but I can suggest a few.
Some way to achieve this would fill a much needed gap for Bazel. In my case it would solve what deployment targets I should run and for large repos where |
@purkhusid but can you please confirm/deny the transitive part of my previous comment? i can see two general approaches to a "determine affected targets at source version A" oracle that lives outside* of bazel: (1) use specially-crafted repo-scale (2) have bazel dump a transitive hash of every target in the repo at versions am i missing something, or have i concisely summarized the two approaches? notes:
there are tradeoffs between these two general approaches. (2) unconditionally does repo-scale work, since it unconditionally computes and dumps a [transitive] hash of every target in the repo. contrast that with (1), which does basically no work for trivial changes. but the downside of (1) is the worst-case amount of work is perhaps higher since a repo-scale * i say "lives outside bazel" because bazel's incrementality engine is ofc its own oracle but in @vitarb's first comment they say they don't want to unconditionally run |
@haxorz These 2 options do pretty much summarize it. Option (2) does sound like the most user friendly way of doing this. All attempts at (1) that I've seen so far require that you jump through various hoops and usually end up being very complicated as can be seen here for example: https://groups.google.com/forum/#!msg/bazel-discuss/I9udqWIcEdI/iczVgWLOBQAJ Going the (2) route would make this more native to bazel and a whole lot easier to do. But my guess is that creating transitive keys for each action/rule is a non-trivial task? |
correct, both in terms of amount of code that would need to be written and also in terms of the runtime cost of that code. expanding on the latter, i think we'd not want this code to run by default (and when it runs we don't want to store the full results inside of the bazel server either); e.g. going with your |
I work with @linzhp and @vitarb, for some context on why we use use the list of changed targets between revisions and how it is implemented:
There's some more ideas, such as figuring out which targets changed directly vs which ones changed because a dependency changed, and test directly affected targets with more expensive features like msan or a race detector etc. Our current implementation is like the mentioned 2):
We chose this approach over 1) as it was more straightforward to implement and predict the performance and accuracy. The approach mentioned in 1) gets particularly tricky when dealing with changes to the workspace and rule implementations. We ended up choosing I'd love to see this supported in bazel at some point. One caveat of our approach is that our list of targets is getting rather long and unfortunately bazel doesn't have support for invocation with argfiles, so we use a bit of a bazelrc workaround to feed the list into a build step (#8609 (comment)). Meanwhile I'll see if we can make the separate go code we use for the above open source. |
Thank you for that context. One relevant fact that I failed to mention is that we do have nascent / experimental code in place for top-down caching : c5c078c I imagine we could come up with some interface to expose the action sketches if this is the sort of thing that would facilitate your work here. |
@robbertvanginkel It would be pretty awesome if your approach could be open sourced while Bazel does not have the tools needed available. We are in the early stages of our Bazel adoption and this is our biggest pain point at the moment. We would gladly help with making it more robust. @ericfelly Is this some form of the transitive keys that @haxorz talked about? |
Yes these are a form of transitive keys. We don't have an interface which exposes them directly. What would you like it to look like? |
Is it possible to expose it in aquery via a flag? |
@meisterT do you think the action sketches could be exposed in aquery? how difficult would that be? |
I am not yet familiar with action sketches. When are they computed? |
They are currently computed when you have top-down caching (experimental feature) enabled. See ActionSketchFunction. What you could do is, when you run the analysis phase, launch the ActionSketchFunction for each action you come across. Then you could expose the action sketch in the output of aquery. |
We have a similar use-case. The question we want to answer is simply “what targets changed since the last master commit” to determine which need to be deployed. What form factor that output has is not as important (diff a set of hashes between two git commits straight in bazel, a list of targets + hash as build artifact and then a manual diff, etc). While hashing the deployment files works as a workaround, a query answering the question “what targets changed between commit A and B” seems very useful. |
@robbertvanginkel @linzhp @vitarb Could you elaborate on what part of the query output you use to calculate the hash for each target? I'm taking a look at doing something similar but I'm not so sure what parts of the proto output I should be interested in. |
Looping back here one last time! The algorithm in this Gist. is working well in our CI systems have not seen it miss anything yet. Good luck to anyone else trying to implement this by hand. |
@robbertvanginkel @linzhp Has there been any luck with open sourcing what your have at Uber? I took a stab at this myself but I have a feeling that I might have missed some edge cases: https://github.com/purkhusid/biff |
@purkhusid I tried out |
@rohansingh Awesome! I've been meaning to put some more time into it and add some tests to validate that it does the right thing. If you have any improvements you would like to add to it then don't hesitate to send a PR/create an issue. |
Internally at Google, we use something more like @haxorz 's (1). For that reason, I think it's unlikely that we'll prioritize exposing such a transitive hash. However, the action sketch mentioned by @ericfelly seems like it could be a good option in concert with aquery. It's basically all open-sourced already, so PRs to integrate it seem reasonable, either from Google or other contributors. |
I finally got around to open sourcing our target selection system, seen here https://github.com/Tinder/bazel-diff. It is a ready to go CLI system to allow you to perform target selection on massive codebases (handles massive Bazel query argument lists, and massive Bazel Protobuf's via the streamed-proto output option) |
@tinder-maxwellelliott Cool! I skimmed through your code and it LGTM! Fyi: You would have been bit by #12086 (specifically at https://github.com/Tinder/bazel-diff/blob/master/src/main/java/com/bazel-diff/BazelRule.java#L24), so I just wanted to make sure you're aware of that bug (and its fix!). |
Basically we cannot rely on |
Yes, using The bug occurs only in the situation described in the issue title. Maybe that doesn't happen in the codebase in which you use your The bug was introduced in commit 9f2cab5 on 12 May. I don't know offhand which Bazel release first included that commit, nor do I know which release will first include the fix. I can look up that info for you if you want. |
Bazel v3.3.0 is the first release that included the bug. |
Yes, the |
cc @aiuto for the 3.5.1 patch release and @laurentlb for the 3.6.0 release |
To save people from digging, it seems the issue @haxorz mentioned is fixed in 3.7. |
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team ( |
This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please reach out to the triage team ( |
this would still be useful. can we reopen? |
cc @bazelbuild/triage |
Description of the problem / feature request:
Buck has the concept of rule keys, which we can obtain by running
buck targets --show-rulekey //...
. We'd love to see similar feature in Bazel.Feature requests: what underlying problem are you trying to solve with this feature?
We need to see what targets are changed, thus need to rebuilt and test, from one revision to another. In order to keep master green at scale, we need to build and test diffs in parallel. Getting the rule key for each target help us decide what diffs are independent and safe to build and test in parallel.
The text was updated successfully, but these errors were encountered: