Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to write bazel query output directly to a file #24293

Closed
keithl-stripe opened this issue Nov 12, 2024 · 1 comment
Closed

Add option to write bazel query output directly to a file #24293

keithl-stripe opened this issue Nov 12, 2024 · 1 comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc team-Performance Issues for Performance teams type: feature request

Comments

@keithl-stripe
Copy link
Contributor

keithl-stripe commented Nov 12, 2024

Description of the feature request:

Our repository contains about 700,000 targets. We use the output of bazel query to improve CI performance, by restricting the Bazel build to changed targets and their transitive dependencies (similar to bazel-diff).

Specifically, we run:

bazel query --output=streamed_proto //...

This produces a 6.8 GB file and takes (~cold):

  • 16 seconds to download/unpack external repos
  • 14 seconds to parse all the BUILD.bazel files
  • 4 seconds to evaluate the query expression
  • 1 minute, 35 seconds to render the protos to stdout

We'd like to speed up this last step, as it’s 74% of wall time.

Through Java profiling (via YourKit and Java Flight Recorder) we've noticed that Bazel spends a lot of CPU and wall time marshaling the query output to gRPC to send back to the Bazel client. This would be eliminated by writing directly to a file.

Which category does this issue belong to?

Core, Performance

What underlying problem are you trying to solve with this feature?

Improve bazel query performance when the output is destined for a file

Which operating system are you running Bazel on?

Linux Ubuntu 24.04.1

What is the output of bazel info release?

release 7.2.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@michajlo
Copy link
Contributor

We actually used to have something like this for an internal output-formatter implementation, and we'd attach a FIFO (pipe) so we could pipeline result processing. Unfortunately we wound up running into a lot of issues with pipes, java, and interrupt handling, so we forewent it in favor of reading results directly from blaze's grpc interface, which was much faster than reading it via the bazel cpp client (the bottleneck at that point), but it does require knowing how to talk to bazel directly over grpc. This was a while ago, so I'm not sure what the current state of performance for all these things is.

Anyway, I bring this up in case you were considering any sort of similar pipelining using this flag.

@joeleba joeleba added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Nov 19, 2024
copybara-service bot pushed a commit that referenced this issue Nov 27, 2024
…rectly to a file

This is a proposed fix for #24293

This speeds up a fully warm `bazel query ...` by 23.7%, reducing wall time from 1m49s to 1m23s

```
$ time bazel query '...' --output=streamed_proto > queryoutput4.streamedproto

real    1m48.768s
user    0m27.410s
sys     0m19.646s

$ time bazel query '...' --output=streamed_proto --output_file=queryoutput5.streamedproto

real    1m22.920s
user    0m0.045s
sys     0m0.016s
```

_💁‍♂️ Note: when combined with #24305, total wall time is 37s, an overall reduction of 66%._

Closes #24298.

PiperOrigin-RevId: 700583890
Change-Id: Ic13f0611aca60c2ce8641e72a0fcfc330f13c803
iancha1992 pushed a commit to iancha1992/bazel that referenced this issue Dec 2, 2024
…rectly to a file

This is a proposed fix for bazelbuild#24293

This speeds up a fully warm `bazel query ...` by 23.7%, reducing wall time from 1m49s to 1m23s

```
$ time bazel query '...' --output=streamed_proto > queryoutput4.streamedproto

real    1m48.768s
user    0m27.410s
sys     0m19.646s

$ time bazel query '...' --output=streamed_proto --output_file=queryoutput5.streamedproto

real    1m22.920s
user    0m0.045s
sys     0m0.016s
```

_💁‍♂️ Note: when combined with bazelbuild#24305, total wall time is 37s, an overall reduction of 66%._

Closes bazelbuild#24298.

PiperOrigin-RevId: 700583890
Change-Id: Ic13f0611aca60c2ce8641e72a0fcfc330f13c803
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc team-Performance Issues for Performance teams type: feature request
Projects
None yet
Development

No branches or pull requests

6 participants