-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support running persistent workers remotely #10091
Comments
/cc @buchgr |
@tsiq-charliem there's the https://blog.bazel.build/2019/02/01/dynamic-spawn-scheduler.html that will get you the best of both. We currently have no plans for to work on worker support for remote execution. |
Closing as we don't plan to add persistent worker support for remote execution. |
I have a patch for this. It's pretty small, so it may be acceptable to check it in? |
Interesting, how does that look like? |
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Support for remote persistent workers is one of our most requested features and we've seen significant performance improvements in some real-world scenarios with proprietary codebases. I've rebased my change to HEAD, but I still need to add some tests. |
Your patch seems reasonable as far as adding the tool signature to the Platform. As far as I read it, it's just adding the hash of the tool paths, not trying to get a stronger signature like the digest of the binaries, right? It's possible that splitting into multiple cache keys, one for each referenced tool, might be desirable because it gives the server more scheduling flexibility in terms of prioritizing one tool over another, but I'm not sure how frequent multi-tool actions are so it may not make much difference in practice. I'm a little skeptical of adding built-in support for this in Bazel without understanding what a workable server implementation looks like. When we've noodled around on this in the past, it's been hard to come up with something that was safe, flexible enough to handle varying workloads and multiple tools, and that provided reasonable affordances for debugging. Does this ultimately break down to per-worker pool targeting, combined with some server functionality to keep the tools up, allow for resets, etc.? |
Bazel already supports workers with a single worker 'tool' with a specific API (actually, there are two APIs - the vanilla API and the multiplex API). This PR only annotates the remote execution requests with just enough information to be able to implement the same API remotely. Note that it is safe for a server to ignore this information, and just continue as usual. Also note that this is behind an experimental flag. There are any number of ways to implement this on the server-side. Per-worker pool is one way, although that doesn't seem very appealing to me. Generally speaking, we have found it straightforward to keep track of the most recent 'persistent worker key' for each worker and assign actions to a matching worker if possible. It may be necessary to overprovision worker resources to allow the scheduler sufficient leeway in assigning actions to workers. Certainly, a first-come-first-serve scheduler will struggle if there is queueing as it won't be able to make meaningful decisions. However, a scheduler could also delay actions (say for a few hundred ms) or reorder the first few queue entries to generate more options. In the first case, there is a chance that another better-matching worker instance becomes available during that time. In the reordering case, there is a chance that another better-matching action is near the front of the queue (but probably requires a safe-guard to prevent actions from being skipped indefinitely). On the positive side, we've seen performance improvements even if we can only find matching workers for a small percentage of actions. There's basically no downside to providing the extra information - the performance is virtually identical to the non-persistent-worker case if we can't ever schedule an action to a matching instance. We have also seen cases where moving to remote builds without persistent workers is a significant performance regression compared to local builds because the action graph is not sufficiently wide (and given the inherent overhead of remote execution), and local builds already use persistent workers. Finally, people seem to be happy to enable this without particular regard to safety or security given the significant benefits we're seeing on the performance side. Given that people are happy to use remote caching (which has strictly worse safety and security), I find this entirely unsurprising. Debugging hasn't been an issue for us so far, maybe because persistent workers are already widely used for local execution. |
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Ulf, could you turn the above into a little design doc and attach it to a PR for this change? |
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Ulf, are you far enough along with this that you could do a design doc? Eric is concerned that getting remote workers to be safe and correct is not that easy, but it would be a great feature. |
+1 for a design doc. Ideally with some discussion on whether this is needed in all cases, or only to reduce latency on user-driven incremental builds - I'd be much less concerned if e.g. caching was disabled for remote-worker actions and there was no cross-user sharing of workers, as that'd significantly reduce the blast radius of issues while potentially keeping all the interesting benefits?
FWIW most groups I've worked with that enabled remote caching have poisoned their cache at least once, so I'd still suggest having a response plan for dealing with that :). |
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
I finally wrote a design doc: bazelbuild/proposals#219 |
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Would it be helpful for remexec backends to implement their ends of remote persistent workers if they could depend on |
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
@ulfjack While playing around with your patch, I noted that the change doesn't fully specify the initial Unfortunately, I had to stuff it in the |
Splitting it out of the main worker package makes re-using the code for other implementations for dispatching requests to workers (e.g., for remote persistent workers, bazelbuild#10091) easier.
What's the status of this? |
Not yet, but we started looking into this again just last week. We have to play around with the patch to see if it needs any change. |
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Add a new `--experimental_remote_mark_tool_inputs` flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag. Fixes bazelbuild#10091. Co-authored-by: Ulf Adams <[email protected]>
Add a new `--experimental_remote_mark_tool_inputs` flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag. Fixes bazelbuild#10091. Co-authored-by: Ulf Adams <[email protected]>
Add a new `--experimental_remote_mark_tool_inputs` flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag. Fixes bazelbuild#10091. Co-authored-by: Ulf Adams <[email protected]>
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3 (cherry picked from commit 526fb58)
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements bazelbuild#10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3 (cherry picked from commit 526fb58)
Add a new `--experimental_remote_mark_tool_inputs` flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag. Fixes bazelbuild#10091. Co-authored-by: Ulf Adams <[email protected]> Closes bazelbuild#16362. PiperOrigin-RevId: 482433908 Change-Id: I2a80834731fd0169c08d4beea348f61a323ca028
Add a new `--experimental_remote_mark_tool_inputs` flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag. Fixes bazelbuild#10091. Co-authored-by: Ulf Adams <[email protected]> Closes bazelbuild#16362. PiperOrigin-RevId: 482433908 Change-Id: I2a80834731fd0169c08d4beea348f61a323ca028
Description of the problem / feature request:
Looking for support for running persistent workers on remote hosts.
Feature requests: what underlying problem are you trying to solve with this feature?
Currently, actions can be run with the
remote
strategy or theworker
strategy, but we'd like a way to get the benefits of a persistent worker on a remote build. Without this, our local builds with persistent workers outperform remote builds.What operating system are you running Bazel on?
Ubuntu 16.04
What's the output of
bazel info release
?1.0.0
The text was updated successfully, but these errors were encountered: