-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add continuous benchmarks #4537
Comments
See also #2791 We already run However, that run is rather unreliable, as the runs may have noisy neighbors. Switching to a dedicated environment would definitely help. |
@MadVikingGod, I missed the context here. Could you please elaborate on the security issue of running a benchmark CI against every PR? Thank you |
Also note that pushing the benchmark results to GitHub pages seems like outside the acceptable use for GitHub apps/extensions, since bots should only be reading data, not writing to the repository. https://github.com/open-telemetry/community/blob/main/docs/using-github-extensions.md |
We accept PRs from everyone, and because those PRs can change what commands are run on the dedicated hardware, it is a security risk to have them run on every PR. GitHub does not recommend using dedicated runners with a public repository |
@MadVikingGod, is it OK if I assign this to myself? I want to push this issue. |
To make the continuous benchmark work, we at least need to solve the following issues:
Proposal:
Other optional goals:
|
There has been several discussions about this at the community level, to set this up in an automated way where no single human would have the keys. No definitive solution arose yet (the closest one would be actuated, but it's not dedicated runners AFAIK). |
IIUC, we have two types of github runners available to use:
Related issue:
I can try actuated's machines first to see if they can produce stable benchmark results. If it doesn't, I can try |
testing, but not benchmarks? |
Yes, not benchmark, only testing. https://github.com/open-telemetry/opentelemetry-collector/blob/227fb823dbf33e6a62b842aef5a70e10ed1db84f/.github/workflows/build-and-test-arm.yml#L27 IIUC, the collector CI is not running any benchmark. |
Update: The actuated runner cannot run for the forked repository https://github.com/XSAM/opentelemetry-rust/actions/runs/9968339009/job/27543447028 (it will wait for a runner to pick up this job forever) |
Tasks
|
@XSAM the resource limits I was thinking about in the SIG meeting were actually just Go flags: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/4301d25c8fbe97a8153eca91acc370b990011af2/.github/workflows/build-and-test-windows.yml#L53-L54 Not sure that will have an impact on the consistency of benchmarks. You can likely just disregard. |
Part of #4537 Changes: - Against every commit on the main branch. - Use actuated runner `actuated-arm64-4cpu-4gb` for benchmark.
Part of #4537 Here is the recent benchmark result from the CI: - https://github.com/open-telemetry/opentelemetry-go/actions/runs/10067601185 - https://github.com/open-telemetry/opentelemetry-go/actions/runs/10068023613 - https://github.com/open-telemetry/opentelemetry-go/actions/runs/10072452529 Though the results are pretty stable, they only run for very limited benchmarks, which cannot reflect the stability when running more benchmarks. --- This PR enables all benchmarks in CI so we can obtain accurate stability results. I also removed the `timeout` flag for the benchmark test, as some benchmarks contain a lot of sub-benchmarks, which will timeout anyway if we have a 60-second timeout. (Go does not offer timeout for single benchmark or test)
…s commit (#5664) Part of #4537 The benchmark action does not compare the exact result from the previous commit. For instance, a recent action https://github.com/open-telemetry/opentelemetry-go/actions/runs/10175392022 compares 2a454a7, which is a commit that pushed a week ago. This is because `actions/cache@v4` will not refresh the cache when the key exists. This is the logs from https://github.com/open-telemetry/opentelemetry-go/actions/runs/10175392022/job/28170968531 on `Post Download previous benchmark data` step. ``` Cache hit occurred on the primary key Linux-benchmark, not saving cache. ``` Moreover, GitHub action only invalidates the key that has not been accessed in over 7 days, which means forever in our case as long as we push a new commit to the main branch. https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy --- To fix this, I use the current sha as the cache key and then use the previous sha to fetch the cache. So, our benchmark can compare the exact result from the previous commit. Action result from my fork: https://github.com/XSAM/opentelemetry-go/actions/runs/10189773887/job/28188416879
Part of #4537 These are some benchmark results from the `actuated` runner. It is pretty clear to me that this runner does not have a stable environment to produce stable benchmark results: - https://github.com/open-telemetry/opentelemetry-go/actions/runs/10278899889 - https://github.com/open-telemetry/opentelemetry-go/actions/runs/10275384692 Thus, I am switching the runner to the self-hosted runner, which uses bare metal, to run benchmarks. This is the request to use this type of runner. open-telemetry/community#2266 The underlying machine is m3.small.x86. It has 8 cores 64 GiB. https://deploy.equinix.com/product/servers/m3-small/ --- This is an example of using this runner from otel java. https://github.com/open-telemetry/opentelemetry-java/actions/runs/10277337397/job/28439208528 --- This runner cannot be triggered on a forked repo: XSAM@faab7b9. So, it is quite safe if we only let it run on the main branch.
Part of #4537, replace #5671 I tried to use `save-always` to simplify it, but this flag never worked. actions/cache#1315. So, I split the cache action into two, one for restore and one for save. The save step would always run even if a step failed. Some actions I run to prove it could work: - On success: https://github.com/XSAM/opentelemetry-go/actions/runs/10292883964/job/28488154161 - On failure: https://github.com/XSAM/opentelemetry-go/actions/runs/10292907887/job/28488227777
For example, the
It might be worth introducing a |
Problem Statement
I want to understand how the go SDK's performance has changed over time.
Proposed Solution
Use the newly available dedicated environments in the Otel Org to run benchmarks automatically.
Tasks to make this happen
[1]: Currently, there are approximately 300 benchmarks with 3 data points each (ns/op, allocations/op, and B/op). The current tools will make a graph for each one, which is overwhelming.
[2]: This can be done via:
Other Notes
This can't be run against every PR. There isn't a way to securely run against a PR and not have it be open for exploration. Instead, this should be run against releases.
The text was updated successfully, but these errors were encountered: