-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make CI workflow resilient to multiple runs on the same sha or tag #3635
Comments
I'm wondering when two runs on the same sha can occur. Is it only when an empty commit to a branch is pushed to trigger the build, when the branch still has a build running? |
Like a new tag and the latest master HEAD? Or do we handle that differently now in |
@alpeb @ihcsim Good points. I've updated the issue title and description to better describe the situation. We don't exactly rely on the sha, but rather on the output of The situations where this can occur include:
|
Turns out the job and runner context are pretty limited: job:
{
"status": "Success"
}
runner:
{
"os": "Linux",
"tool_cache": "/opt/hostedtoolcache",
"temp": "/home/runner/work/_temp",
"workspace": "/home/runner/work/linkerd2"
} and I couldn't find anything useful in the I'm gonna try generating a random number in bash and storing it in the workflow's temp dir. Will have to check that dir is not shared across runs. |
@alpeb I could be wrong, but I wonder if the GitHub Actions UI is obfuscating additional fields from the https://github.com/linkerd/linkerd2/runs/273480582#step:2:50 |
I tested this: - name: Dump runner.tracking_id
env:
RUNNER_ID: ${{ runner.tracking_id }}
run: echo "$RUNNER_ID"
- name: Check if runner.tracking_id is empty
env:
RUNNER_ID: ${{ runner.tracking_id == ''}}
run: echo "$RUNNER_ID"
- name: Dump runner.tracking.id
env:
RUNNER_ID: ${{ runner.tracking.id }}
run: echo "$RUNNER_ID"
- name: Check if runner.tracking.id is empty
env:
RUNNER_ID: ${{ runner.tracking.id == ''}}
run: echo "$RUNNER_ID" which showed |
Turns out not all environment variables are carried over when using the |
Answering myself :-P I had forgotten that each job uses a brand new environment and so the temp dir starts anew for each job. OTOH, I stumbled upon this that describes a way to retrieve the "check suite ID" through github's API and |
I just noticed that |
@alpeb Thanks for all the digging. That check suite approach is interesting, though I think it grabs the latest I tested with: curl -s -H "Accept: application/vnd.github.antiope-preview+json" "https://api.github.com/repos/linkerd/linkerd2/commits/4ea87eae0dfbbad33698155a75b1842573020321/check-suites" I'm reluctant to just fail the build. Let's hold off on doing this until there's a way to uniquely identify a workflow run (or it becomes painful enough to warrant failing the build). |
I think I don't understand your objection. From what I've tested, that API will return an array where each element corresponds to each github actions build run that has been triggered for the supplied sha. Each element describes the status (running, completed), the conclusion (success, failure) and (from the looks of it) an id that is unique for that build run. I'm ok with holding off on this one, just wanted to clarify we're on the same page. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
Fixes #3911 Refactors the `cloud_integration` test to run in separate GKE clusters that are created and torn down on the fly. It leverages a new "gcloud" github action that is also used to set up gcloud in other build steps (`docker_deploy` and `chart_deploy`). The action also generates unique names for those clusters, based on the git commit SHA and `run_id`, a recently introduced variable that is unique per CI run and available to all the jobs. This fixes part of #3635 in that CI runs on the same SHA don't interfere with one another (in the `cloud_integration` test; still to do for `kind_integration`). The "gcloud" GH action is supported by `.github/actions/gcloud/index.js` that has a couple of dependencies. To avoid having to commit `node_modules`, after every change to that file one must run ```bash # only needed the first time npm i -g @zeit/ncc cd .github/actions/gcloud ncc build index.js ``` which generates the self-contained file `.github/actions/gcloud/dist/index.js`. (This last part might get easier in the future after other refactorings outside this PR).
* Allow CI to run concurrent builds in master Fixes #3911 Refactors the `cloud_integration` test to run in separate GKE clusters that are created and torn down on the fly. It leverages a new "gcloud" github action that is also used to set up gcloud in other build steps (`docker_deploy` and `chart_deploy`). The action also generates unique names for those clusters, based on the git commit SHA and `run_id`, a recently introduced variable that is unique per CI run and available to all the jobs. This fixes part of #3635 in that CI runs on the same SHA don't interfere with one another (in the `cloud_integration` test; still to do for `kind_integration`). The "gcloud" GH action is supported by `.github/actions/gcloud/index.js` that has a couple of dependencies. To avoid having to commit `node_modules`, after every change to that file one must run ```bash # only needed the first time npm i -g @zeit/ncc cd .github/actions/gcloud ncc build index.js ``` which generates the self-contained file `.github/actions/gcloud/dist/index.js`. (This last part might get easier in the future after other refactorings outside this PR). * Run integration tests for forked repos Signed-off-by: Kevin Leimkuhler <[email protected]> * Address reviews Signed-off-by: Kevin Leimkuhler <[email protected]> * Address more reviews Signed-off-by: Kevin Leimkuhler <[email protected]> * Move some conditionals to jobs Signed-off-by: Kevin Leimkuhler <[email protected]> * Change job name Signed-off-by: Kevin Leimkuhler <[email protected]> * Move more conditionals to job level Signed-off-by: Kevin Leimkuhler <[email protected]> * Added more flags to 'gcloud container clusters create' and consolidated 'create' and 'destroy' into ' action' * Run kind cleanup only for non-forked PRs Signed-off-by: Kevin Leimkuhler <[email protected]> * Got rid of cloud_cleanup by using a post hook in the gcloud action * Removed cluster naming responsibility from the gcloud action * Consolidate .gitignore statements * Removed bin/_gcp.sh * Change name of Kind int. test job Signed-off-by: Kevin Leimkuhler <[email protected]> * Ensure `kind_cleanup` still runs on cancelled host CI runs Signed-off-by: Kevin Leimkuhler <[email protected]> * Add reviews Signed-off-by: Kevin Leimkuhler <[email protected]> * Update workflow comment Signed-off-by: Kevin Leimkuhler <[email protected]> * Split index.js into setup.js and destroy.js * trigger build * Moved the gcloud action into its own repo * Full version for the gcloud GH action * Rebase back to master Signed-off-by: Kevin Leimkuhler <[email protected]> * Remvoe additional changes Signed-off-by: Kevin Leimkuhler <[email protected]> * Remove additional changes Signed-off-by: Kevin Leimkuhler <[email protected]> * Trigger CI Signed-off-by: Kevin Leimkuhler <[email protected]> Co-authored-by: Alejandro Pedraza <[email protected]>
* Allow CI to run concurrent builds in master Fixes #3911 Refactors the `cloud_integration` test to run in separate GKE clusters that are created and torn down on the fly. It leverages a new "gcloud" github action that is also used to set up gcloud in other build steps (`docker_deploy` and `chart_deploy`). The action also generates unique names for those clusters, based on the git commit SHA and `run_id`, a recently introduced variable that is unique per CI run and available to all the jobs. This fixes part of #3635 in that CI runs on the same SHA don't interfere with one another (in the `cloud_integration` test; still to do for `kind_integration`). The "gcloud" GH action is hosted under its own repo in https://github.com/linkerd/linkerd2-action-gcloud
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
Fixed in #4033! |
Background
PR #3602 guaranteed that all jobs in a CI workflow received the same copy of the repo. This made our CI workflow resilient to code changes occurring during the workflow run, both in master and the PR itself.
Problem
Our workflow uses the sha of the repo to name global objects that are shared between jobs within a workflow. When two workflow runs occur on the same sha or tag, these two runs will collide. Specifically, in the names of the kind clusters and git repo artifacts they create.
Collision examples
Proposal
Assign unique names to kind clusters and git repo artifacts, based on the current workflow run, in addition to the output of
bin/root-tag
.For example, instead of:
do something like:
(I'm not sure if
runner.tracking_id
exists or is too long, but we need something unique like this, or something in the{{job}}
context.) More info at https://help.github.com/en/github/automating-your-workflow-with-github-actions/contexts-and-expression-syntax-for-github-actions.Example code sites requiring update
kind cluster name
linkerd2/.github/workflows/workflow.yml
Lines 312 to 313 in 397970e
git Repo artifact
linkerd2/.github/workflows/workflow.yml
Lines 78 to 79 in 397970e
kubeconfig file
Note this filename is based on the kind cluster name, so fixing kind cluster naming should fix this automatically:
linkerd2/.github/workflows/workflow.yml
Line 391 in 397970e
The text was updated successfully, but these errors were encountered: