Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Umbrella Issue] Create a Image Promotion process #157

Closed
dims opened this issue Dec 5, 2018 · 73 comments
Closed

[Umbrella Issue] Create a Image Promotion process #157

dims opened this issue Dec 5, 2018 · 73 comments
Assignees
Labels
area/artifacts Issues or PRs related to the hosting of release artifacts for subprojects area/release-eng Issues or PRs related to the Release Engineering subproject lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/release Categorizes an issue or PR as relevant to SIG Release.

Comments

@dims
Copy link
Member

dims commented Dec 5, 2018

Split from #153 (see that for some context)

cc @javier-b-perez @mkumatag @listx

@dims
Copy link
Member Author

dims commented Dec 5, 2018

Notes from Nov 28th meeting

AI: @dims to follow up with Javier (current google promoter KEP)
2 issues here
storage/serving for arbitrary artifacts (including cloud-local mirrors)
Container registry for official images (including mirrors?)
AI: @bburns to contemplate registry mirroring, attestation, etc
AI: @justinsb to collab with @bburns
Undecided: do sub-projects push to one-true staging, and trust not to step on each other or push to per-sub areas?

@dims dims changed the title [Umbrella Issue] Setup a GCR Registry for projects to use [Umbrella Issue] Create a Image Promotion process Dec 5, 2018
@dims
Copy link
Member Author

dims commented Dec 5, 2018

Related to #158

@listx
Copy link
Contributor

listx commented Dec 22, 2018

I'm planning to OSS a proof of concept tool that understands how to reconcile a registry manifest (basically a list of images with digest/tag mappings) from a source registry to a destination registry. This should be available in January 2019. Once that tool is open sourced, I can wire up a basic promotion process for a number of images into a test registry to demonstrate how it will work.

For now the tool prototype deals with 2 registries (source vs dest, aka "staging" vs "prod"), but it is trivial to extend it to deal with more than 1 destination (so that we can have mirrors for example). After it is open sourced we can have more discussions about it in its own repository (or continue in this issue, I guess?).

I'll be offline the rest of this month so I'll see you guys in January!

@spiffxp
Copy link
Member

spiffxp commented Jan 9, 2019

/assign @dims
Seen as the "larger" umbrella issue that could maybe subsume #158 (which assumes we need GCR repos per project, maybe we find a way to promote images that doesn't require this)

@listx
Copy link
Contributor

listx commented Jan 9, 2019

Happy New Year all!

The tool I've worked on was submitted for internal review yesterday (as part of Google's open sourcing process). After it gets approved, I will create a public demo of it in action and update this issue accordingly.

@dims
Copy link
Member Author

dims commented Jan 9, 2019

very cool @listx thanks for the update and a very happy new year to you as well.

@listx
Copy link
Contributor

listx commented Jan 15, 2019

Update: The project has been approved and the Container Image Promoter now lives in https://github.com/GoogleCloudPlatform/k8s-container-image-promoter. Work now begins in creating a public demo of it in action (I plan to devote cycles on this to get the demo working by the end of Q1 this year).

Once the demo is complete, I think it's just a matter of using it as a template for migrating some of the official images from gcr.io/google-containers to another (probably CNCF-owned) GCR. I just imagine a future where the K8s container image release process happens more transparently for the community. Hopefully the image promotion process is a solid step in that direction.

@listx
Copy link
Contributor

listx commented Jan 29, 2019

The design doc for the demo around this can be found here: https://docs.google.com/document/d/1WGFt5ck_XGf71PO4c87UMPVU_4Q7AV-7tRV4Z6wmZL4/edit?usp=sharing

@listx
Copy link
Contributor

listx commented Feb 16, 2019

Another update: I have a demo Prow cluster (http://35.186.240.68/) that's listening to all changes to the manifest in https://github.com/cip-bot/cip-manifest. That repo houses a manifest that is obviously only for demo purposes, but if you have a look at this PR: cip-bot/cip-manifest#2 you can see how a proposed changed to the manifest will trigger Prow jobs that perform a dry run of the promoter; merging that PR resulted in the promoter running for real (no dry run) and modifying the destination registry.

I would like to have kubernetes-sigs/promo-tools#7 fixed before we think about really using this for existing (large-ish?) GCRs. It's not a big show-stopper though.

So basically like 90% of the pieces are there --- we just need to migrate the Prow job configs to either kubernetes/test-infra or somewhere else (the Prow jobs need to run on someone's cluster) and set up the right service-account permissions. Not sure where I should upload these Prow jobs --- maybe kubernetes/test-infra? @BenTheElder wdyt?

@dims
Copy link
Member Author

dims commented Feb 16, 2019

@listx Nice!

+1 to add jobs to kubernetes/test-infra

Also, can we run the garbage collector in dry run mode to check what if any will get wiped out in production registry before turning it on?

@dims
Copy link
Member Author

dims commented Feb 16, 2019

/assign @thockin @BenTheElder

@listx
Copy link
Contributor

listx commented Feb 16, 2019

@dims After we make garbage collection aware of manifest lists, sure (otherwise it will print a bunch of false positives about needing to delete tagless images that are referenced by manifest lists). The more I think about it, the more I want to just separate GC entirely from promotion. Less complexity per execution of the promoter is a good thing.

And also, we could make GC much smarter and safer, by "promoting" to a "graveyard" GCR, in case anyone deletes a tag from the manifest by accident. Just an idea.

Anyway, we could also just disable garbage collection for the time being as it's not a critical feature as far as promotion is concerned.

@dims
Copy link
Member Author

dims commented Feb 17, 2019

@listx makes sense "disable garbage collection for the time being as it's not a critical feature as far as promotion" +1

I like the graveyard GCR too :)

@BenTheElder
Copy link
Member

kubernetes/test-infra SGTM, I would poke @fejta about our strategy for "trusted" jobs, as this should be one.

+1 to dry-run first, not sure I understand the graveyard GCR 🙃

@dims
Copy link
Member Author

dims commented Feb 19, 2019

update on dockerhub integration kubernetes-sigs/promo-tools#9

@listx
Copy link
Contributor

listx commented Feb 20, 2019

+1 to dry-run first, not sure I understand the graveyard GCR

I was thinking that the graveyard GCR could host images that were deemed OK to delete (permanently) from a prod GCR. Thinking about this some more, though, maybe it's cleaner if we just implement soft-deletion (make the promoter "delete" images) by moving images to a different path within the same GCR.

Anyway the idea for keeping things around in the "graveyard" was to make sure we can undo image deletions --- just in case we accidentally delete an image for whatever reason.

@hh
Copy link
Member

hh commented Feb 20, 2019

Action Items from February 20th Meeting:

  • @thockin / @listx Promoter + Prow Job
  • @thockin to provide a list of staging repos / groups who own them
  • @thockin Set up GCR Repo (scripted last week for staging)
  • @dims yaml file pushed to k8s.io repo
  • @listx Document procedures for eventually moving the image promoter repo to k8s-test-infra - Currently repo sits in Google GitHub org repo

I'm willing to help / coordinate with any of the above.

@javier-b-perez
Copy link
Contributor

I have some security concerns about running this in prow.
@thockin @listx will the prow job promote the container images? this mean that Prow will require write access in the GCR. Do we trust that no one else can "promote" images using prow?

@fejta
Copy link
Contributor

fejta commented Feb 20, 2019

IMO these image promotion jobs should run in their own security domain:

  • Separate from CI jobs which run arbitrary presubmit code after an /ok-to-test
  • Separate from trusted prow binaries which may write access to many kubernetes repos.

AKA we trust them more than standard jobs (only run on merged code) and less than prow itself (approvals are not restricted to prow oncall).

A good way to solve these issues would be for the wg-k8s-infra team to:

  • create a cluster dedicated to these running these jobs
  • give the jobs the necessary credentials to promote these images
  • configure prow to schedule these jobs in this new cluster (we already support and use this functionality).

Another idea might be to follow the pattern we use to have prow update itself:

That way the system is fully automated, but gated on someone trusted approving the PRs before they are used in production.

@spiffxp
Copy link
Member

spiffxp commented Apr 15, 2021

/milestone v1.22

@k8s-ci-robot k8s-ci-robot modified the milestones: v1.21, v1.22 Apr 15, 2021
@spiffxp
Copy link
Member

spiffxp commented Jul 9, 2021

/milestone v1.23
I'm moving to Blocked as I'm not sure what the status of this is anymore

@k8s-ci-robot k8s-ci-robot modified the milestones: v1.22, v1.23 Jul 9, 2021
@spiffxp
Copy link
Member

spiffxp commented Sep 29, 2021

/milestone clear
Clearing from milestone, I'm not sure what remains to be done

@k8s-ci-robot k8s-ci-robot removed this from the v1.23 milestone Sep 29, 2021
@k8s-ci-robot k8s-ci-robot added sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. and removed wg/k8s-infra labels Sep 29, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 28, 2021
@cpanato
Copy link
Member

cpanato commented Dec 29, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 29, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 29, 2022
@riaankleinhans
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 29, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 27, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 27, 2022
@ameukam
Copy link
Member

ameukam commented Aug 19, 2022

I think we are done with this. Image promotion is now part of the release process and use by the different SIGs and subprojects.

Thank you everyone for the work done!
/close

@k8s-ci-robot
Copy link
Contributor

@ameukam: Closing this issue.

In response to this:

I think we are done with this. Image promotion is now part of the release process and use by the different SIGs and subprojects.

Thank you everyone for the work done!
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts Issues or PRs related to the hosting of release artifacts for subprojects area/release-eng Issues or PRs related to the Release Engineering subproject lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

No branches or pull requests