Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Ensure delete.sh script works properly with implicit namespace #217

Merged
merged 1 commit into from
Aug 29, 2022

Conversation

njhill
Copy link
Member

@njhill njhill commented Aug 25, 2022

Motivation

The delete.sh cleanup script currently doesn't require a namespace to be provided via the -n option and uses the current kubectl context's namespace otherwise. The $namespace variable wasn't set in this case however meaning later parts of the script might not work as intended.

Modifications

Ensure $namespace variable is set correctly either way.

Result

delete.sh script works properly when -n option isn't used.

Motivation

The delete.sh cleanup script currently doesn't require a namespace to be provided via the -n option and uses the current kubectl context's namespace otherwise. The $namespace variable wasn't set in this case however meaning later parts of the script might not work as intended.

Modifications

Ensure $namespace variable is set correctly either way.

Result

delete.sh script works properly when -n option isn't used.

Signed-off-by: Nick Hill <[email protected]>
Copy link
Contributor

@chinhuang007 chinhuang007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kserve-oss-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chinhuang007, njhill

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kserve-oss-bot kserve-oss-bot merged commit 25e6704 into main Aug 29, 2022
@ckadner
Copy link
Member

ckadner commented Aug 30, 2022

@njhill -- Sorry I am late to the party. I just tried the new script and I am not getting the result I was expecting. I ran the script without the namespace flag in the default namespace. User error?

I start out with a working "quickstart" deployment:

$ kubectl get ServingRuntimes

NAME           DISABLED   MODELTYPE     CONTAINERS   AGE
mlserver-0.x              sklearn       mlserver     11d
ovms-1.x                  openvino_ir   ovms         11d
triton-2.x                keras         triton       11d

Set the current namespace to default:

$ kubectl config set-context --current --namespace=default

Context "ckadner-modelmesh-dev/cbu3tugd0r5j02te049g" modified.

No ServingRuntimes in default namespace:

$ kubectl get ServingRuntimes

No resources found in default namespace.

Run delete script without namespace option

$ ./scripts/delete.sh 

current namespace: default
deleting in namespace: default
~/GolandProjects/modelmesh-serving_ckadner/config/default ~/GolandProjects/modelmesh-serving_ckadner/config
~/GolandProjects/modelmesh-serving_ckadner/config
~/GolandProjects/modelmesh-serving_ckadner/config/rbac/namespace-scope ~/GolandProjects/modelmesh-serving_ckadner/config
~/GolandProjects/modelmesh-serving_ckadner/config
customresourcedefinition.apiextensions.k8s.io "inferenceservices.serving.kserve.io" deleted
customresourcedefinition.apiextensions.k8s.io "predictors.serving.kserve.io" deleted
customresourcedefinition.apiextensions.k8s.io "servingruntimes.serving.kserve.io" deleted
Error: unknown flag: --load-restrictor
No resources found

3 Modelmesh pods still running:

$ kubectl get pods --all-namespaces | grep model

modelmesh-serving     etcd-7dbb56b4d9-fxmd6                                 1/1     Running   0             11d
modelmesh-serving     minio-6c55d85f46-kzfnp                                1/1     Running   0             11d
modelmesh-serving     modelmesh-controller-7f7fb577db-5c8c9                 1/1     Running   0             11d

Try the delete script again after setting current namespace to modelmesh-serving:

$ kubectl config set-context --current --namespace=modelmesh-serving

Context "ckadner-modelmesh-dev/cbu3tugd0r5j02te049g" modified.

Run the delete script again:

$ ./scripts/delete.sh 

current namespace: modelmesh-serving
deleting in namespace: modelmesh-serving
~/GolandProjects/modelmesh-serving_ckadner/config/default ~/GolandProjects/modelmesh-serving_ckadner/config
~/GolandProjects/modelmesh-serving_ckadner/config
~/GolandProjects/modelmesh-serving_ckadner/config/rbac/namespace-scope ~/GolandProjects/modelmesh-serving_ckadner/config
~/GolandProjects/modelmesh-serving_ckadner/config
deleting cluster scope RBAC
serviceaccount "modelmesh" deleted
serviceaccount "modelmesh-controller" deleted
role.rbac.authorization.k8s.io "modelmesh-controller-leader-election-role" deleted
role.rbac.authorization.k8s.io "modelmesh-controller-restricted-scc-role" deleted
clusterrole.rbac.authorization.k8s.io "modelmesh-controller-role" deleted
rolebinding.rbac.authorization.k8s.io "modelmesh-controller-leader-election-rolebinding" deleted
rolebinding.rbac.authorization.k8s.io "modelmesh-controller-restricted-scc-rolebinding" deleted
clusterrolebinding.rbac.authorization.k8s.io "modelmesh-controller-rolebinding" deleted
networkpolicy.networking.k8s.io "modelmesh-controller" deleted
networkpolicy.networking.k8s.io "modelmesh-runtimes" deleted
configmap "model-serving-config-defaults" deleted
deployment.apps "modelmesh-controller" deleted
Error: unknown flag: --load-restrictor
No resources found

2 `modelmesh-serving pods still running:

$ kubectl get pods --all-namespaces | grep model

modelmesh-serving     etcd-7dbb56b4d9-fxmd6                                 1/1     Running   0             11d
modelmesh-serving     minio-6c55d85f46-kzfnp                                1/1     Running   0             11d

Run the delete script again with the namespace flag:

$ ./scripts/delete.sh --namespace modelmesh-serving

current namespace: modelmesh-serving
Context "ckadner-modelmesh-dev/cbu3tugd0r5j02te049g" modified.
deleting in namespace: modelmesh-serving
~/GolandProjects/modelmesh-serving_ckadner/config/default ~/GolandProjects/modelmesh-serving_ckadner/config
~/GolandProjects/modelmesh-serving_ckadner/config
~/GolandProjects/modelmesh-serving_ckadner/config/rbac/namespace-scope ~/GolandProjects/modelmesh-serving_ckadner/config
~/GolandProjects/modelmesh-serving_ckadner/config
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "modelmesh-controller-rolebinding" not found
Error: unknown flag: --load-restrictor
No resources found

2 Modelmesh pods still running:

$ kubectl get pods --all-namespaces | grep model

modelmesh-serving     etcd-7dbb56b4d9-fxmd6                                 1/1     Running   0             11d
modelmesh-serving     minio-6c55d85f46-kzfnp                                1/1     Running   0             11d

@ckadner
Copy link
Member

ckadner commented Aug 30, 2022

I noticed I was using an old version of kustomize (3.2)

I redeployed the Modelmesh Quickstart this time with kustomize v4.5.6 and the run the delete script again without specifying the namespace flag:

$ ./scripts/delete.sh 

current namespace: modelmesh-serving
deleting in namespace: modelmesh-serving
~/GolandProjects/modelmesh-serving_ckadner/config/default ~/GolandProjects/modelmesh-serving_ckadner/config
~/GolandProjects/modelmesh-serving_ckadner/config
~/GolandProjects/modelmesh-serving_ckadner/config/rbac/namespace-scope ~/GolandProjects/modelmesh-serving_ckadner/config
~/GolandProjects/modelmesh-serving_ckadner/config
deleting cluster scope RBAC
serviceaccount "modelmesh" deleted
serviceaccount "modelmesh-controller" deleted
role.rbac.authorization.k8s.io "modelmesh-controller-leader-election-role" deleted
role.rbac.authorization.k8s.io "modelmesh-controller-restricted-scc-role" deleted
clusterrole.rbac.authorization.k8s.io "modelmesh-controller-role" deleted
rolebinding.rbac.authorization.k8s.io "modelmesh-controller-leader-election-rolebinding" deleted
rolebinding.rbac.authorization.k8s.io "modelmesh-controller-restricted-scc-rolebinding" deleted
clusterrolebinding.rbac.authorization.k8s.io "modelmesh-controller-rolebinding" deleted
networkpolicy.networking.k8s.io "modelmesh-controller" deleted
networkpolicy.networking.k8s.io "modelmesh-runtimes" deleted
customresourcedefinition.apiextensions.k8s.io "inferenceservices.serving.kserve.io" deleted
customresourcedefinition.apiextensions.k8s.io "predictors.serving.kserve.io" deleted
customresourcedefinition.apiextensions.k8s.io "servingruntimes.serving.kserve.io" deleted
configmap "model-serving-config-defaults" deleted
deployment.apps "modelmesh-controller" deleted
service "etcd" deleted
deployment.apps "etcd" deleted
secret "model-serving-etcd" deleted
service "minio" deleted
deployment.apps "minio" deleted
secret "storage-config" deleted

All modelmesh pods are gone now:

$ kubectl get pods --all-namespaces | grep model

🦗 🦗 🦗

The modelmesh-serving Service is still around though:

$ kubectl get svc --all-namespaces | grep model

modelmesh-serving     modelmesh-serving                    ClusterIP      None             <none>          8033/TCP,8008/TCP,2112/TCP   11d

@njhill njhill deleted the fix-delete-ns branch August 30, 2022 23:54
@njhill
Copy link
Member Author

njhill commented Aug 30, 2022

Thanks @ckadner. Re the kustomize thing, Looks like in other places we made it compatible with different versions, I guess it would be good to do that here too.

That's interesting that the Service wasn't cleaned up. It's created by the controller and should be "owned" by the controller's Deployment so I'm surprised that it wasn't deleted automatically when the controller deployment was deleted. Maybe you could inspect it's metadata to see what's going on with the owner reference?

@chinhuang007
Copy link
Contributor

chinhuang007 commented Aug 31, 2022

The service currently belongs to the namespace when ModelMesh is in the default cluster scope mode, because the controller deployment can't own resources in other namespaces. While the service could still be owned by the controller deployment in the controller namespace, the existing use of ServiceReconciler.NamespaceOwned,

, is not sufficient to determine which the owner should be.

It will be the case for user namespaces, meaning delete of ModelMesh resources won't delete the service. This particular behavior is independent of the delete script. Additional logic could be added I guess to have the service in the controller namespace owned by controller deployment while services in user namespaces owned by namespace.

@chinhuang007
Copy link
Contributor

The service should be cleaned up when ModelMesh is installed with --namespace-scope-mode.

@njhill
Copy link
Member Author

njhill commented Aug 31, 2022

Ah good point @chinhuang007 my head was still in namespace scope. I guess the main thing probably would be to just make sure the delete script explicitly cleans up everything it should when in cluster mode. E.g. using a selector or something. Probably we should have a minor bug issue for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants