Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bazel query --output_file option, which writes query results directly to a file #24298

Closed

Conversation

keithl-stripe
Copy link
Contributor

@keithl-stripe keithl-stripe commented Nov 12, 2024

This is a proposed fix for #24293

This speeds up a fully warm bazel query ... by 23.7%, reducing wall time from 1m49s to 1m23s

$ time bazel query '...' --output=streamed_proto > queryoutput4.streamedproto

real    1m48.768s
user    0m27.410s
sys     0m19.646s

$ time bazel query '...' --output=streamed_proto --output_file=queryoutput5.streamedproto

real    1m22.920s
user    0m0.045s
sys     0m0.016s

💁‍♂️ Note: when combined with #24305, total wall time is 37s, an overall reduction of 66%.

@iancha1992 iancha1992 added team-Performance Issues for Performance teams team-Core Skyframe, bazel query, BEP, options parsing, bazelrc labels Nov 12, 2024
if (outputFile == null) {
return createInternal(env.getReporter().getOutErr().getOutputStream());
}
Path fullPath = env.getWorkspace().getRelative(outputFile);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be CommandEnvironment#getWorkspace or CommandEnvironment#getWorkingDirectory? The latter should let you store the file beneath the workspace root?

@michajlo
Copy link
Contributor

We're working on channeling anything (via flags) that writes to the local filesystem outside of the output tree to the InstrumentationOutput abstraction? Could you see if that could be used without too much hardship? We're still working out the ergonomics, so I apologize if it might not be straightforward (yet).

@keithl-stripe keithl-stripe requested a review from a team as a code owner November 13, 2024 14:22
@keithl-stripe
Copy link
Contributor Author

We're working on channeling anything (via flags) that writes to the local filesystem outside of the output tree to the InstrumentationOutput abstraction? Could you see if that could be used without too much hardship? We're still working out the ergonomics, so I apologize if it might not be straightforward (yet).

I'm OK with trying this, but from the name and docs, InstrumentationOutput does not quite seem like a good fit. The file produced --output_file is the entire result of the bazel query command, not just instrumentation/metrics about Bazel itself.

@michajlo
Copy link
Contributor

InstrumentationOutput does not quite seem like a good fit. The file produced --output_file is the entire result of the bazel query command, not just instrumentation/metrics about Bazel itself.

Agreed the name isn't ideal, but it's the same thing in spirit. I think this just might be the ~first1 time we've had something like this, or the first we've encountered it since taking inventory and naming this concept

Footnotes

  1. We actually had something like the proposed flag that we removed a while ago, as noted in https://github.com/bazelbuild/bazel/issues/24293#issuecomment-2473826800.

@keithl-stripe
Copy link
Contributor Author

InstrumentationOutput does not quite seem like a good fit. The file produced --output_file is the entire result of the bazel query command, not just instrumentation/metrics about Bazel itself.

Agreed the name isn't ideal, but it's the same thing in spirit. I think this just might be the ~first1 time we've had something like this, or the first we've encountered it since taking inventory and naming this concept

Footnotes

  1. We actually had something like the proposed flag that we removed a while ago, as noted in Add option to write bazel query output directly to a file #24293 (comment).

I just tried using InstrumentationOutput and it doesn't seem like a great fit in its current form:

  1. Doesn't allow paths relative to the user's working directory
  2. Doesn't allow introspecting the resolved path (I use the path in an error message)
  3. I don't see how to integrate it with LazyFileOutputStream

@keithl-stripe
Copy link
Contributor Author

I just tried using InstrumentationOutput and it doesn't seem like a great fit in its current form:

  1. Doesn't allow paths relative to the user's working directory
  2. Doesn't allow introspecting the resolved path (I use the path in an error message)
  3. I don't see how to integrate it with LazyFileOutputStream

Update: I fixed 1 and 2 myself, and @jin said I don't need 3.

switch (destinationRelativeTo) {
case OUTPUT_BASE -> env.getOutputBase().getRelative(destination);
case WORKSPACE_OR_HOME -> env.getWorkspace().getRelative(destination);
case WORKING_DIR -> env.getWorkingDirectory().getRelative(destination);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the author of InstrumentationOutput, I want to sincerely thank you for adopting InstrumentationOutput.

However, please note that env.getWorkingDirectory() is not "getting the directory from which the user ran Bazel". From its implementation and JavaDoc, this method returns "working directory of the server", which could be identical to workspace, in most cases I think. I apologize for this confusion. And this is painful for us as well, which makes us making the initial assumption to conflate workspace and working directory.

As of now, we are researching about fixing this flawed assumption. So it is totally fine to opt out of adopting InstrumentationOutput in this change. I think we can do the switch after we know how to sort out this ball of mud.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the explanation! Note: you linked to BlazeDirectories.getWorkingDirectory, which you described accurately; but I was not calling that; I was calling CommandEnvironment.getWorkingDirectory(), which is the client working directory :)

/**
* Returns the working directory of the {@code blaze} client process.
*
* <p>This may be equal to {@code BlazeRuntime#getWorkspace()}, or beneath it.
*
* @see #getWorkspace()
*/
public Path getWorkingDirectory() {
return workingDirectory;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misread on this. Then this looks good to me.

return new FileQueryRuntimeHelper(instrumentationOutput);
} catch (IOException e) {
throw new QueryRuntimeHelperException(
"Could not open query output file " + instrumentationOutput.getHumanReadableName(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you alluded to this in my other comment - is this the absolute path? a perhaps not-well-documented thing we like to try to do is avoid absolute paths in user-visible messaging as much as possible, this way if the local filesystem (CI etc) doesn't match the end-user's filesystem there's less confusion. also, as you may have deduced from the instrumentation-output abstraction, this path might not reflect the actual file we were operating on.

that said, logging absolute paths to info logging should be fine.

Copy link
Contributor Author

@keithl-stripe keithl-stripe Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, okay. Personally when Bazel outputs a path, it can be difficult to understand which of Bazel's many directories it is relative to. So I would prefer Bazel only print absolute paths :)

@keithl-stripe keithl-stripe requested a review from jin November 14, 2024 16:55
Copy link
Member

@jin jin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo tests.

@zhengwei143 zhengwei143 self-requested a review November 15, 2024 10:14
@keithl-stripe keithl-stripe requested a review from jin November 18, 2024 00:03
@jin jin added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Nov 18, 2024
@jin
Copy link
Member

jin commented Nov 20, 2024

Your added test is failing internally. I'll attempt to fix it, but apologies if there's a delay.

@keithl-stripe
Copy link
Contributor Author

Your added test is failing internally. I'll attempt to fix it, but apologies if there's a delay.

Thank you so much for the update! Hope they are not too onerous to fix.

@github-actions github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Nov 27, 2024
iancha1992 pushed a commit to iancha1992/bazel that referenced this pull request Dec 2, 2024
…rectly to a file

This is a proposed fix for bazelbuild#24293

This speeds up a fully warm `bazel query ...` by 23.7%, reducing wall time from 1m49s to 1m23s

```
$ time bazel query '...' --output=streamed_proto > queryoutput4.streamedproto

real    1m48.768s
user    0m27.410s
sys     0m19.646s

$ time bazel query '...' --output=streamed_proto --output_file=queryoutput5.streamedproto

real    1m22.920s
user    0m0.045s
sys     0m0.016s
```

_💁‍♂️ Note: when combined with bazelbuild#24305, total wall time is 37s, an overall reduction of 66%._

Closes bazelbuild#24298.

PiperOrigin-RevId: 700583890
Change-Id: Ic13f0611aca60c2ce8641e72a0fcfc330f13c803
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Core Skyframe, bazel query, BEP, options parsing, bazelrc team-Performance Issues for Performance teams
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants