Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete old resources during upgrade when a component no longer includes them #374

Closed
ghost opened this issue Aug 4, 2021 · 5 comments
Closed
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@ghost
Copy link

ghost commented Aug 4, 2021

Feature request

When a component removes a resource from their release build it would be great to have that resource removed during operator's upgrade cycle.

Use case

In Tekton Pipelines 0.19 we introduced a PodDisruptionBudget object to try and support "high availability" out of the box. Later, in 0.22, we removed the PDB because it became apparent that it could block node draining in a user's cluster. Pipeline's release.yaml changed from one release to the next - first including the PDB object and then excluding it.

If a user had installed 0.19 then later upgraded to 0.22 the PDB would stick around in the tekton-pipelines namespace. This is because running kubectl apply -f release.yaml doesn't perform any kind of pruning. There are two options that I'm aware of atm to work around this:

  1. There's an alpha flag, --prune, which might support this? I can't tell precisely but it sounds like what I'm describing here.
  2. There's an approach where the user first runs kubectl delete -f <previous release.yaml> and then runs kubectl apply -f <new release.yaml>. I guess this results in a short amount of downtime while the resources turn down and then turn up, but it certainly seems like it'll be precise when cleaning up.

I'm not sure what the best approach for the Operator to take is - maybe there's a way to diff the new version with the resources in the targetNamespace or something like that?

@ghost ghost added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 4, 2021
@nikhil-thomas
Copy link
Member

This is definitely something we need as part of "seamless upgrades" story.

  1. There's an approach where the user first runs kubectl delete -f and then runs kubectl apply -f . I guess this results in a short amount of downtime while the resources turn down and then turn up, but it certainly seems like it'll be precise when cleaning up.

This is definitely a feasible approach. However, we will have to make sure that we don't delete CRDs, as we don't want all Pipeline, Triggers CRD instances to be lost. I assume it will be totally safe to remove all the namespaced resources in tekton-pipelines and then recreate them. We can even consider settting up a new namespace with new version of a component and then deleting the previous namespace, like a node replace in a kubernetes upgrade (here, my imagination did go a little wild :D ).

Cleaning up ClusterScoped resource could be a handled separately. One approach could be, instead of using the instance of TektonConfig as the ownerReference of all the resources created/managed by the operator; we can create a configMap with some data pertaining to an update (version, date etc) and use that as the owner reference for all the resources created by the operator. When a new upgrade happens we can create a new configMap and update the ownerReference in all the resources that are still relevant. Then we can delete the old "owner" configMap and all the obsolete resources should be garbage collected 😇 . (TektonConfig could be the ownerReference for the "owner" configMap).

@nikhil-thomas
Copy link
Member

cc @sm43

@nikhil-thomas
Copy link
Member

The following PRs have fixed this:

In short, a new CRD, TektonInstallerSet, is introduced. It decouples the ownerreference of all created resources from top level CRDs like TektonPipeline and sets the owner refereces to instalces of TektonInstallerSet. For each upgrade a new InstallerSet is created and the old one will be deleted.

In effect, all resources except CRDs (Pipeline, Tasks, Triggers ...) will be deleted with the old TektonInstallerSet instance and recreated from the upgraded manifest by creating a new TektonInstallerSet instance. This will ensure that obsolete resources are removed automatically with each upgrade, as the TektonInstallerSet instance owning them gets deleted in the upgrade.

Note: This idea is similar to the one mentioned in the comment above. TektonInstallerSet CRD is more like a value added "Clusterscoped ConfigMap" with its own controller.

@nikhil-thomas
Copy link
Member

/close

@tekton-robot
Copy link
Contributor

@nikhil-thomas: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants