Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

design-proposal: VM delete protection #363

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jcanocan
Copy link
Contributor

@jcanocan jcanocan commented Dec 4, 2024

What this PR does / why we need it:

The proposal introduces a mechanism to protect VMs from being deleted. By adding the annotation kubevirt.io/vm-delete-protection and with a value of true or True to the VM object, the VM cannot be deleted by any means. This would prevent accidental deletions in VMs that users considers crucial for their applications/clusters.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # CNV-48696

Special notes for your reviewer:

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note:

None

@kubevirt-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign davidvossel for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Dec 4, 2024
Differences between options are negligible in any scenario analyzed. Moreover, **no performance degradation has been
observed.**

Given the pros and cons, **OPTION \#1 is preferred in this case.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we go with this option then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, yes, we should go with this option.

The VM protection is implemented by adding
a [ValidationAdmissionPolicy and ValidationAdmissionPolicyBinding](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/)
objects. On every delete action against a VM object, the ValidatingAdmissionPolicy will check for presence of the
annotation. If it is present and its value equals “true” or “True” the request will be rejected. The

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to explain its behavior in scenarios where it is misconfigured?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might create a really verbose and noisy outputs IMO. Documentation should cover how to properly configure the feature, and it should make clear that other configurations/values, will allow the deletion.

@vladikr
Copy link
Member

vladikr commented Dec 4, 2024

/cc

@kubevirt-bot kubevirt-bot requested a review from vladikr December 4, 2024 14:33
@jcanocan jcanocan force-pushed the vm-delete-protection branch from 45dc46b to 510b378 Compare December 5, 2024 09:44
@jcanocan
Copy link
Contributor Author

jcanocan commented Dec 5, 2024

v2:

  • Included Section about Finalizers.

Copy link
Contributor

@jean-edouard jean-edouard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm not against this (as long as option 1 is used), I don't quite see the point.
Are we trying to protect PVCs that would have the VM as a parent? In that case, I think we should take steps to protect the PVCs directly instead.
The VM spec itself really isn't all that precious, and would cost just a few kilobytes of storage to backup somewhere if it really did contain special data.

This proposal sounds like there are interesting field stories associated with it, if so could you maybe share a concrete example of what happened?

@codingben
Copy link
Member

While I'm not against this (as long as option 1 is used), I don't quite see the point. Are we trying to protect PVCs that would have the VM as a parent?

No. We're trying to protect VMs from deletion by accident, e.g. by sysadmins that may run business critical applications or services (e.g. SQL server).

@jcanocan jcanocan force-pushed the vm-delete-protection branch 2 times, most recently from e1c6873 to 355ae1a Compare December 9, 2024 10:36
@jean-edouard
Copy link
Contributor

While I'm not against this (as long as option 1 is used), I don't quite see the point. Are we trying to protect PVCs that would have the VM as a parent?

No. We're trying to protect VMs from deletion by accident, e.g. by sysadmins that may run business critical applications or services (e.g. SQL server).

Sure but what do you mean by "VMs"? What, if not disks, are we afraid to lose?
From your answer, it sounds like we're more trying to protect against VM shutdown rather than VM deletion, maybe that's what the annotation should be about?

@jcanocan
Copy link
Contributor Author

jcanocan commented Dec 9, 2024

While I'm not against this (as long as option 1 is used), I don't quite see the point. Are we trying to protect PVCs that would have the VM as a parent?

No. We're trying to protect VMs from deletion by accident, e.g. by sysadmins that may run business critical applications or services (e.g. SQL server).

Sure but what do you mean by "VMs"? What, if not disks, are we afraid to lose? From your answer, it sounds like we're more trying to protect against VM shutdown rather than VM deletion, maybe that's what the annotation should be about?

As stated in the design document, what we are trying to protect is the VirtualMachine object from deletion.
Please note that this is a costumer request: https://issues.redhat.com/browse/CNV-46398?focusedId=25412651&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-25412651
So, what we would like to achieve is a deletion protection for VM objects similar to RHV or other cloud providers have.
While I think your point about protecting VM from shutting down is valid, users look more concerned about accidental deletions.

@jean-edouard
Copy link
Contributor

While I'm not against this (as long as option 1 is used), I don't quite see the point. Are we trying to protect PVCs that would have the VM as a parent?

No. We're trying to protect VMs from deletion by accident, e.g. by sysadmins that may run business critical applications or services (e.g. SQL server).

Sure but what do you mean by "VMs"? What, if not disks, are we afraid to lose? From your answer, it sounds like we're more trying to protect against VM shutdown rather than VM deletion, maybe that's what the annotation should be about?

As stated in the design document, what we are trying to protect is the VirtualMachine object from deletion. Please note that this is a costumer request: https://issues.redhat.com/browse/CNV-46398?focusedId=25412651&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-25412651 So, what we would like to achieve is a deletion protection for VM objects similar to RHV or other cloud providers have. While I think your point about protecting VM from shutting down is valid, users look more concerned about accidental deletions.

Thank you for the pointer. Please note that access to the comment you linked to is restricted. The important part of it is that Having some form of delete protection would provide peace of mind against sysadmin accidents.
With great power comes great responsibility... No matter what we do, kubectl delete -f will bypass it. We're also talking about protecting something that can easily be re-created (a VM spec).
That being said, I don't feel strongly against this, as the maintenance burden seem minimal.
@vladikr WDYT?

@jcanocan
Copy link
Contributor Author

jcanocan commented Dec 9, 2024

While I'm not against this (as long as option 1 is used), I don't quite see the point. Are we trying to protect PVCs that would have the VM as a parent?

No. We're trying to protect VMs from deletion by accident, e.g. by sysadmins that may run business critical applications or services (e.g. SQL server).

Sure but what do you mean by "VMs"? What, if not disks, are we afraid to lose? From your answer, it sounds like we're more trying to protect against VM shutdown rather than VM deletion, maybe that's what the annotation should be about?

As stated in the design document, what we are trying to protect is the VirtualMachine object from deletion. Please note that this is a costumer request: https://issues.redhat.com/browse/CNV-46398?focusedId=25412651&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-25412651 So, what we would like to achieve is a deletion protection for VM objects similar to RHV or other cloud providers have. While I think your point about protecting VM from shutting down is valid, users look more concerned about accidental deletions.

Thank you for the pointer. Please note that access to the comment you linked to is restricted. The important part of it is that Having some form of delete protection would provide peace of mind against sysadmin accidents.

Yes, this is a good summary. Thanks!

With great power comes great responsibility... No matter what we do, kubectl delete -f will bypass it.

Just one note, both approaches mentioned in this design document does not allow the force deletion either. You could check it in the PoCs linked.

We're also talking about protecting something that can easily be re-created (a VM spec). That being said, I don't feel strongly against this, as the maintenance burden seem minimal. @vladikr WDYT?

I agree, but from the user point of view, they will have the guarantee that the VM will not be deleted no matter what.

Copy link
Member

@stu-gott stu-gott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I don't see why this is needed. I've seen reference to tekton pipelines as a justification, but I don't understand what the real problem is there.


### Enable the VM Object Protection

To protect VMs against deletion, they need to be annotated with `kubevirt.io/vm-delete-protection` set to `true` or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd much rather have a first-class setting in the VM. Documentation of annotations tends to be fragmented and doesn't lead to a very good user experience. Making a dedicated setting in the spec means it will be properly documented in swagger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for your feedback, much appreciated.

While I'm not against this idea, I can see potential challenges in the future. IMHO, a good documentation entry should be enough. The API field vs an annotation has the following advantages:

  • Low maintenance, no need to maintain a new API field.
  • Extensible, it would be just a matter of adding new resourcesRules to the existing VAP if we would like to protect other KubeVirt objects. Creating a dedicated API field will force us to complicate the VAP and even to move the API setting from different places (creating room for potential backwards compatibility challenges).

@jcanocan
Copy link
Contributor Author

Overall I don't see why this is needed. I've seen reference to tekton pipelines as a justification, but I don't understand what the real problem is there.

IMHO, Jed summed it up really well here: #363 (comment)
Looks like users accidentally delete VMs more that they would like to. Therefore, they are missing some delete protection in the VM objects. Most cloud providers have this option, e.g., https://cloud.google.com/compute/docs/instances/preventing-accidental-vm-deletion

Copy link
Member

@lyarwood lyarwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be an invalid use case but what about using status and a subresource API to control this behaviour to allow finer grain RBAC control?


## User Stories

As a user who has the permissions to delete VMs, I want to protect my VMs from accidental deletion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd simplify this and just say As a VM Owner, the fact the owner has permissions to delete the VM isn't relevant to the fact that they want to protect it from accidental deletion by them or a third party.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


1. Removing the `kubevirt.io/vm-delete-protection` annotation with the following patch operation:
```bash
$ kubectl patch vm <vm-name> --type=json -p='[{"op": "remove", "path": "/metadata/annotations/kubevirt.io~1vm-delete-protection"}]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ultimately means that anyone with the ability to update/patch the VM can enable or disable this functionality. Have we considered placing it in status controlled by a subresource with finer grain RBAC contol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but I assume this is the same concern raised in the top comment of your review.

I didn't think about that TBH, but it feels a bit overkill IMHO. However, what would be the ultimate goal of doing that? I'm not sure if this particular use case requires it. Please, elaborate a bit if you think otherwise.

In any case, if we find out this is required, we could implement it in a follow-up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ultimate goal would be to allow a smaller subset of users (super admins?) to have access to the delete protection toggle than have the ability to update the VM object itself. It's extra work but given the use case here I'm not sure we want to give everyone access to this toggle right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ultimate goal would be to allow a smaller subset of users (super admins?) to have access to the delete protection toggle than have the ability to update the VM object itself.

@lyarwood this feature is only about the single persona, the VM owner with "fat fingers". An additional role would be not helpful in this use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dominikholler well it's any user with DELETE privileges against the VM surely so not just an owner but anyone in the same namespace or admin. It also doesn't protect against that same user accidentently removing the label and thus the protection before requesting the deletion.

Moving the toggle into the status would avoid this as users wouldn't be able to touch it without going through the sub resource API that itself could be limited to a specific owner and/or super set of admins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also doesn't protect against that same user accidentently removing the label and thus the protection before requesting the deletion.

This would require two steps, this feature is just protecting from a single unintended step, e.g. a `kubectl delete vm/xxx" .

@jcanocan jcanocan force-pushed the vm-delete-protection branch from 355ae1a to f734d3f Compare December 10, 2024 13:46
@vladikr
Copy link
Member

vladikr commented Dec 10, 2024

Thank you for taking the time to write this proposal @jcanocan.

At first, I thought that this might be a good idea. Then I became conflicted as it seems that ultimately we are protecting these VMs from the cluster admins.
Those who have the ultimate responsibility in the cluster...
Aren't we going to ruin the hierarchy in the cluster if we allow users to prevent an admin operation?

If we really need such a layer of protection, I would imagine this feature working from another direction.
Perhaps a user could provide a known "hint" (in the form of a label) and the admin could create a list of protected VMs that have label X.

@jcanocan
Copy link
Contributor Author

Thank you for taking the time to write this proposal @jcanocan.

At first, I thought that this might be a good idea. Then I became conflicted as it seems that ultimately we are protecting these VMs from the cluster admins. Those who have the ultimate responsibility in the cluster... Aren't we going to ruin the hierarchy in the cluster if we allow users to prevent an admin operation?

If we really need such a layer of protection, I would imagine this feature working from another direction. Perhaps a user could provide a known "hint" (in the form of a label) and the admin could create a list of protected VMs that have label X.

We could enable the protection by setting a label like:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/vm-delete-protection: "true"

In this way, as you mentioned, the cluster-admin could get a list of VMs with the protection enabled with something like:

kubectl get vm --selector=kubevirt.io/vm-delete-protection=true

WDYT? Did I get your idea right?

Thanks for your input.

@vladikr
Copy link
Member

vladikr commented Dec 11, 2024

@jcanocan What I had in mind is something closer to MigrationPolicy or a CR configuration similar to PermittedHostDevices

The idea is to separate the concerns of each persona.
The VM owner would "hint" by setting a label:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/please-avoid-deleting: "true"

Then the admin would set the rule that would prevent the deletion of the associated VMs with the listed label :

preventVMsDeletion:
  selector: kubevirt.io/please-avoid-deleting

@vladikr
Copy link
Member

vladikr commented Dec 12, 2024

@jcanocan What I had in mind is something closer to MigrationPolicy or a CR configuration similar to PermittedHostDevices

The idea is to separate the concerns of each persona. The VM owner would "hint" by setting a label:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/please-avoid-deleting: "true"

perhaps the label/annotations can be protected with cel? @xpivarc

@vladikr
Copy link
Member

vladikr commented Dec 12, 2024

Might be an invalid use case but what about using status and a subresource API to control this behaviour to allow finer grain RBAC control?

It's an interesting idea. How would this work with newly created VMs? Doesn't it have to be only a secondary action?

@dominikholler
Copy link
Contributor

@jcanocan What I had in mind is something closer to MigrationPolicy or a CR configuration similar to PermittedHostDevices

The idea is to separate the concerns of each persona. The VM owner would "hint" by setting a label:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/please-avoid-deleting: "true"

Then the admin would set the rule that would prevent the deletion of the associated VMs with the listed label :

preventVMsDeletion:
  selector: kubevirt.io/please-avoid-deleting

@vladikr Isn't this already working out-of-the-box using ValidatingAdmissionPolicy ?

This feature is about having a fixed marking (e.g. label or annotation) so that the UI can configure a marking for a pre-defined rule.

@dominikholler
Copy link
Contributor

Might be an invalid use case but what about using status and a subresource API to control this behaviour to allow finer grain RBAC control?

It's an interesting idea. How would this work with newly created VMs? Doesn't it have to be only a secondary action?

In which way would help this idea to help the admin with "fat-fingers" ?

@vladikr
Copy link
Member

vladikr commented Dec 12, 2024

@jcanocan What I had in mind is something closer to MigrationPolicy or a CR configuration similar to PermittedHostDevices
The idea is to separate the concerns of each persona. The VM owner would "hint" by setting a label:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/please-avoid-deleting: "true"

Then the admin would set the rule that would prevent the deletion of the associated VMs with the listed label :

preventVMsDeletion:
  selector: kubevirt.io/please-avoid-deleting

@vladikr Isn't this already working out-of-the-box using ValidatingAdmissionPolicy ?

Please elaborate on what is working out of the box. :)
I mentioned CEL (and I meant ValidatingAdmissionPolicy ) in my previous comment

I'm only trying to emphasize that we shouldn't now allow users to block admin operations without the admins' consent or/and a way to override this.

We achieve this separation either with ValidatingAdmissionPolicy or with a policy CRD that the admin would install or a CR config.

This feature is about having a fixed marking (e.g. label or annotation) so that the UI can configure a marking for a pre-defined rule.

This proposal is for KubeVirt which doesn't have a UI... but in all cases, I see a label or an annotation being used to request this feature.

@vladikr
Copy link
Member

vladikr commented Dec 12, 2024

Might be an invalid use case but what about using status and a subresource API to control this behaviour to allow finer grain RBAC control?

It's an interesting idea. How would this work with newly created VMs? Doesn't it have to be only a secondary action?

In which way would help this idea to help the admin with "fat-fingers" ?

Not by itself, but this is just another way to limit the configuration of this feature to cluster admins and vm owners.

@codingben
Copy link
Member

This proposal is for KubeVirt which doesn't have a UI...

I think we should get rid of mentioning "UI" in the design proposal. I've already opened discussion about it earlier this week: #363 (comment)

@lyarwood
Copy link
Member

Might be an invalid use case but what about using status and a subresource API to control this behaviour to allow finer grain RBAC control?

It's an interesting idea. How would this work with newly created VMs? Doesn't it have to be only a secondary action?

Yeah I can't think of a clean way of providing it with the initial definition as you'd always have to remove it once populated in status to ensure a single source of truth.

In which way would help this idea to help the admin with "fat-fingers" ?

Not by itself, but this is just another way to limit the configuration of this feature to cluster admins and vm owners.

Yup it makes the act more deliberate by forcing an admin or whomever it is with access to the subresource to first toggle delete protection there and then fire off the delete.

Without this any user with the ability to update the VM object can toggle delete protection and then nuke the VM.

@dominikholler
Copy link
Contributor

Please elaborate on what is working out of the box. :) I mentioned CEL (and I meant ValidatingAdmissionPolicy ) in my previous comment

I'm only trying to emphasize that we shouldn't now allow users to block admin operations without the admins' consent or/and a way to override this.

We achieve this separation either with ValidatingAdmissionPolicy or with a policy CRD that the admin would install or a CR config.

According to my limited technical understanding, the cluster admin can already right now create a ValidatingAdmissionPolicy like this:

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:   
  name: "vm-delete-protection"
spec:   
  failurePolicy: Fail
  matchConstraints:     
    resourceRules:     
    - apiGroups:   ["kubevirt.io"]
      apiVersions: ["*"]
      operations:  ["DELETE"]
      resources:   ["virtualmachines"]
  variables: 
    - expression: string('kubevirt.io/vm-delete-protection')
      name: annotation
  validations: 
    - expression: (!(variables.annotation in oldObject.metadata.annotations) || !oldObject.metadata.annotations[variables.annotation].matches('^(true|True)$'))
      messageExpression: "'Object ' + string(oldObject.metadata.name) + ' cannot be deleted, remove delete protection'"
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:   
  name: "vm-delete-protection-binding"
spec:   
  policyName: "vm-delete-protection"
  validationActions: [Deny]
  matchResources: 

Now, if the VM owner adds the annotation kubevirt.io/vm-delete-protection: "true" to any VM object, it will be protected against deletion. You can do it by applying the following path to your VM object:

kubectl patch <vm-name> -n <namespace> --type=json -p '[{"op": "add", "path": "/metadata/annotations/kubevirt.io~1vm-delete-protection", "value": "true"}

To reverse the operation and allow deletion, the following command can be used:

kubectl patch vm <vm-name> -n <namespace> --type=json -p '[{"op": "remove", "path": "/metadata/annotations/kubevirt.io~1vm-delete-protection"}]'

the same should work if a label is used instead of the annotation.

This feature is about providing a easy to use, simple and testable interface to the already existing solution.
For this reason the feature would loose its value, if it is introducing more complexity than what already exists.

But I see the point, that there should be a way to introduce a way for the admin to control the deletion protection.

In which way would help this idea to help the admin with "fat-fingers" ?

Not by itself, but this is just another way to limit the configuration of this feature to cluster admins and vm owners.

I see, thanks for the hint!

@dominikholler
Copy link
Contributor

dominikholler commented Dec 13, 2024

Without this any user with the ability to update the VM object can toggle delete protection and then nuke the VM.

Where do you see the issue with this? Isn't it expected that if a user has the VM owners permissions this user is able to delete the VM?

@xpivarc
Copy link
Member

xpivarc commented Dec 13, 2024

@dominikholler The policy you wrote is exactly what you want. Now why should this opinionated (subjective at least to me) policy be part of core Kubevirt? Especially when it is so easy to implement as you demonstrated? Maybe more opinionated component could introduce this?

The proposal introduces a mechanism to protect VMs from being deleted.
By adding the annotation `kubevirt.io/vm-delete-protection` and with a
value of `true` or `True` to the VM object, the VM cannot be deleted by
any means. This would prevent accidental deletions in VMs that users
considers crucial for their applications/clusters.

Signed-off-by: Javier Cano Cano <[email protected]>
@jcanocan jcanocan force-pushed the vm-delete-protection branch from f734d3f to 0d826fd Compare December 16, 2024 10:03
@jcanocan
Copy link
Contributor Author

Update:

  • Added new chosen solution, deploy VAP in SSP.
  • Changed the mechanism to enable the protection from annotation to label
  • Changed affected repos from kubevirt to SSP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.