Skip to content
This repository has been archived by the owner on Aug 17, 2023. It is now read-only.

Add intelligent kustomize sorting in apply and delete #386

Merged
merged 2 commits into from
Aug 21, 2020

Conversation

Jeffwan
Copy link
Member

@Jeffwan Jeffwan commented Jul 30, 2020

Resolve #385 and kubeflow/manifests#1421

For single application or stacks, kustomize apply/delete order is not guaranteed and it depends on the kustomize file definition. It brings lots of problems as #385 describes.

Add ordering here to help address them. With this PR, apply and deletion becomes more smooth.

It takes 11 mins to finish deletion in the past. (with all cert-manager, kf-serving, knative, istio-system, kubeflow)
Right now, it only takes 4~5 minutes.


Update: did some test, another case we want to handle is when we delete profile crd and controller, we want some namespaces to be cleaned up, otherwise, it will get stuck there.

I move CRD under APiService and webhooks to leverage finalizers. After this change, everything works perfect.

@kubeflow-bot
Copy link

This change is Reviewable

@Jeffwan
Copy link
Member Author

Jeffwan commented Jul 30, 2020

/hold

@PatrickXYS
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jul 31, 2020
@Jeffwan
Copy link
Member Author

Jeffwan commented Jul 31, 2020

/hold cancel

@PatrickXYS
Copy link
Member

/lgtm

@PatrickXYS
Copy link
Member

@jlewi @animeshsingh Can you take a look at PR?

We think it will efficiently improve User Experience when installing/deleting Kubeflow.

@adrian555
Copy link
Member

@Jeffwan I am trying to understand where this PR can benefit. If we look at the output of the kfctl apply command, we can see that the resources are created in a sorted flavor already, as following example for kubeflow-apps:

INFO[0119] Deploying application kubeflow-apps           filename="kustomize/kustomize.go:248"
2020/07/31 15:10:21 nil value at `valueFrom.configMapKeyRef.name` ignored in mutation attempt
2020/07/31 15:10:21 nil value at `valueFrom.secretKeyRef.name` ignored in mutation attempt
customresourcedefinition.apiextensions.k8s.io/experiments.kubeflow.org created
customresourcedefinition.apiextensions.k8s.io/notebooks.kubeflow.org created
...
mutatingwebhookconfiguration.admissionregistration.k8s.io/admission-webhook-mutating-webhook-configuration created
mutatingwebhookconfiguration.admissionregistration.k8s.io/seldon-mutating-webhook-configuration-kubeflow created
serviceaccount/admission-webhook-service-account created
serviceaccount/argo created
...
role.rbac.authorization.k8s.io/centraldashboard created
role.rbac.authorization.k8s.io/kubeflow-pipelines-cache-deployer-role created
...
clusterrole.rbac.authorization.k8s.io/admission-webhook-cluster-role created
clusterrole.rbac.authorization.k8s.io/admission-webhook-kubeflow-poddefaults-admin created
...
rolebinding.rbac.authorization.k8s.io/centraldashboard created
rolebinding.rbac.authorization.k8s.io/kubeflow-pipelines-cache-binding created
...
clusterrolebinding.rbac.authorization.k8s.io/admission-webhook-cluster-role-binding created
clusterrolebinding.rbac.authorization.k8s.io/centraldashboard created
...
configmap/admission-webhook-admission-webhook-parameters created
configmap/jupyter-web-app-jupyter-web-app-config created
...
secret/katib-controller created
secret/katib-mysql-secrets created
secret/metadata-db-secrets created
secret/mlpipeline-minio-artifact created
secret/mysql-secret-fd5gktm75t created
service/admission-webhook-service created
service/argo-ui created
service/cache-server created
service/centraldashboard created
...
deployment.apps/admission-webhook-deployment created
deployment.apps/argo-ui created
...
application.app.k8s.io/argo created
application.app.k8s.io/centraldashboard created
...
certificate.cert-manager.io/admission-webhook-cert created
certificate.cert-manager.io/seldon-serving-cert created
issuer.cert-manager.io/seldon-selfsigned-issuer created
compositecontroller.metacontroller.k8s.io/kubeflow-pipelines-profile-controller created
destinationrule.networking.istio.io/ml-pipeline created
destinationrule.networking.istio.io/ml-pipeline-mysql created
destinationrule.networking.istio.io/ml-pipeline-ui created
destinationrule.networking.istio.io/ml-pipeline-visualizationserver created
virtualservice.networking.istio.io/argo-ui created
virtualservice.networking.istio.io/centraldashboard created
virtualservice.networking.istio.io/jupyter-web-app created
virtualservice.networking.istio.io/katib-ui created
virtualservice.networking.istio.io/kfam created
virtualservice.networking.istio.io/metadata-grpc created
virtualservice.networking.istio.io/metadata-ui created
virtualservice.networking.istio.io/ml-pipeline-ui created
servicerole.rbac.istio.io/cache-server created
servicerole.rbac.istio.io/ml-pipeline-services created
servicerole.rbac.istio.io/ml-pipeline-ui created
servicerolebinding.rbac.istio.io/bind-cache-server-admission-webhook created
servicerolebinding.rbac.istio.io/bind-gateway-ml-pipeline-ui created
servicerolebinding.rbac.istio.io/bind-ml-pipeline-internal created
persistentvolumeclaim/katib-mysql created
persistentvolumeclaim/metadata-mysql created
persistentvolumeclaim/minio-pv-claim created
persistentvolumeclaim/minio-pvc created
persistentvolumeclaim/mysql-pv-claim created
validatingwebhookconfiguration.admissionregistration.k8s.io/seldon-validating-webhook-configuration-kubeflow created
INFO[0241] Successfully applied application kubeflow-apps  filename="kustomize/kustomize.go:273"

This order looks quite fine to me. So are you saying that this sorting is random and may actually cause problem?

However I can sense the problem likely with the kfctl delete if the resources are not deleted in a reversed order. So instead of adding new logic to sort with overhead, how about just changing the logic in

kustomize.go

		for _, r := range resources {

			err := utils.DeleteResource(r, kubeclient, 5*time.Minute)
			if err != nil {
				msg := fmt.Sprintf("error evaluating kustomization manifest for %v: %v", app.Name, err)
				errList = append(errList, errors.New(msg))
				log.Warn(msg)
			}
		}

to delete the resource in a reversed order?

Another reason to just go with the kustomize is to be sure that all resource kinds are taken care. For example, the ServiceRoleBinding is not in your InstallOrder array.

And if kustomize is sorting the resources not in a right order, maybe we can create issue there?

What do you think?

Thanks.

@Jeffwan
Copy link
Member Author

Jeffwan commented Aug 3, 2020

@adrian555

Thanks for helping review the PR. It depends on how you organize applications in one stack. In your example, you don't have namespace manifest, so deletion will be easier. Can you check your logs on Deleting Kind 'Namespace' in APIVersion 'v1' with name knative, Istio-system will take time and may get time out warning.

Another reason to just go with the kustomize is to be sure that all resource kinds are taken care. For example, the ServiceRoleBinding is not in your InstallOrder array.

yeah, this is true. The list won't cover everything especially on the CRD. That's why we put CRD the end of list in InstallOrder. The major purpose is to make sure APIServer and webhook can be deleted in the right order.

to delete the resource in a reversed order?
This controls kustomize application level, inside each application, it uses default order. For example, cert-manager create namespace, deployment, service. In uninstallation, it use namespace -> deployment -> service order as well. This is not best practice because we don't want to delete namespace directly, especially there's APIService and webhook defined there.

@Jeffwan
Copy link
Member Author

Jeffwan commented Aug 20, 2020

@adrian555 Any other concerns? If not, I will go merge the PR.

@adrian555
Copy link
Member

/lgtm

@Jeffwan
Copy link
Member Author

Jeffwan commented Aug 21, 2020

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jeffwan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit ab3f8a1 into kubeflow:master Aug 21, 2020
PatrickXYS pushed a commit to PatrickXYS/kfctl that referenced this pull request Oct 29, 2020
* Add intelligent kustomize sorting in apply and delete

* Adjust CRD sequence to leverage finalizer to cleanup created resources
crobby pushed a commit to crobby/kfctl that referenced this pull request Feb 25, 2021
* Add intelligent kustomize sorting in apply and delete

* Adjust CRD sequence to leverage finalizer to cleanup created resources
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize Kfctl manifest deletion order
5 participants