-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move CSI snapshot KEP into directory and add an extension for tightening validation #1902
Conversation
Welcome @AndiLi99! |
Hi @AndiLi99. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
keps/sig-storage/177-volume-snapshot-validation/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/177-volume-snapshot-validation/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/177-volume-snapshot-validation/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/177-volume-snapshot-validation/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
1. The first phase is webhook-only, and will use ratcheting validation combined with a reconciliation method to delete or fix currently persisted objects which are invalid under the new (strict) validation rules. | ||
2. The second phase occurs once all invalid objects are cleared from the cluster. The CRD schema validation will be tightened and the webhook will stick around to enforce immutability until immutable fields come to CRDs. (or the crd upgrade could wait until immutable fields are available to do in one go) | ||
|
||
The phases come in separate releases to allow users / cluster admin the opportunity to clean their cluster of any invalid objects. More details are in the Risks and Mitigations section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 1 release enough? I'm not quite sure where this falls under the kubernetes deprecation policy. This is changing the behavior of a beta API, but the deprecation policy mostly talks about removing entire API groups. cc @liggitt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this tightening is like fixing bugs. The oneof restriction is already handled by the controller. So I think we should not wait for more than 1 release.
keps/sig-storage/177-volume-snapshot-validation/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
@kubernetes/sig-storage-proposals |
cc @xing-yang |
There's a validation error:
|
@xing-yang who would be the prr reviewer for the original csi snapshot kep? |
Looks like PRRs are new: #1620 There's a note in the changes to "Please reach out on the I'll ask on there what to do. |
The original snapshot KEP didn't go through the PRR process. You can pick one of (deads2k,johnbelamaric,wojtek-t) as PRR reviewer. |
keps/sig-storage/177-volume-snapshot-validation/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
|
||
## Proposal | ||
|
||
Tighten the validation on Volume Snapshot objects. The following fields will begin to enforce immutability: `VolumeSnapshot.Spec.Source` and `VolumeSnapshotContent.Spec.Source`. The following fields will begin to enforce `oneOf`: `VolumeSnapshot.Spec.Source` and `VolumeSnapshotContent.Spec.Source`). The following fields will begin to enforce non-empty strings: `spec.VolumeSnapshotClassName`. More details are in the Validating Scenarios section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also spec.VolumeSnapshotRef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to the list with a note that it is immutable only after UID is set. I'm not sure if that behaviour is good though. I feel like we shall either make it fully immutable after creation, or not make it immutable at all. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the VolumeSnapshotRef field says it's immutable after creation, is that inconsistent with the UID being set after creation? Also, I thought the controller created the whole VSC at once (but am I mistaken?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the dynamic snapshot case, the controller creates the entire VSC at once, you are correct.
In the pre-provisioned snapshot case, the CA must create the VSC. They must specify the name and namespace of the VS they wish to be bound to the VSC on the VSC.Spec.VolumeSnapshotRef field. If the VS does not exist yet there is no UID set, The controller after binding will set the UID in the VSC.Spec.VolumeSnapshotRef field. So the controller must be able to change that field after it is created to add set the UID one time.
This is a bit confusing but I'm not sure if anything can be done to change this behaviour.
cc @yuxiangqian to check if I got this behaviour right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Could you make a quick PR to update the comment in types.go so that it's consistent with what is actually the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what Andi described is accurate, I guess the semantic is more accurate if put this way: VSC.Spec.VolumeSnapshotRef is immutable (more specifically name, namespace, and UID fields) once bound.
for dynamic case, there is little ambiguity, all three fields will be set at once by controller.
for preprovisioned case, name/namespace are the users responsibility to set, while the UID is the controller's responsibility during binding (two stages instead of one compared to dynamic case). I think it's reasonable to allow updating for name/namespace fields BEFORE a content is bound to a VS, i.e., UID == "", for the pre-provisioned case. A potential use case could be:
- CA creates a pre-provisioned VSC with VS-A name = "a", ns = "a-ns" (not bound yet)
- CA realized they need the VSC to point to VS-B name = "b", ns = "b-ns"
- CA updates the VSC to point to VS-B (if VSC has been bound(by checking UID == ""), it should be rejected by webhook, if not, webhook should allow?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think those semantics make sense. But we shouldn't call the field just "immutable" in that case. I like making more precise by saying "immutable after binding" or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm making a quick PR to update the comment in types.go to say it's immutable after UID is set only.
keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
Fix typo Fix wrong date for beta of snapshot Change to reference release tag Add `VolumeSnapshotContent.spec.VolumeSnapshotRef` to list of immutable fields with note that it is immutable only after UID is set.
keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
| CREATE | spec.Source | Exactly one of PersistentVolumeClaimName (Dynamic) or VolumeSnapshotContentName (Pre-provisioned) should be specified. | 400 | | ||
| UPDATE | spec.Source | Immutable, no updates allowed. If the user has specified an incorrect source, they must delete and remake the snapshot. The webhook validation server will not be able to guarantee that only incorrect sources are allowed to be updated. | 400 | | ||
| CREATE | spec.VolumeSnapshotClassName | Must not be the empty string. Can be unset (to use the default snapshot class, if it is set. If the default snapshot class is not set or there is more than 1 default class, then the hook will allow the creation but the snapshot will fail.), or set to a non-empty string (the snapshot class). | 400 | | ||
| UPDATE | spec.VolumeSnapshotClassName | Same restrictions as CREATE. We won’t restrict updating by making this field immutable (only applying the same restrictions as creation) but this field should only be changed by those who know exactly what they are doing. | 400 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the reason for not restricting updates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuxiangqian would know what specific scenarios are valid for changing the class name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a TODO to resolve this by the time we implement the webhook? We should very clearly understand the use cases for supporting mutability of each field, and perhaps even document the use case in the CRD schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one scenario was the following:
- user creates a VS which points to a non-existing volume snapshot class (fat-fingered or whatever reason), in this case the volume snapshot controller will fail the create a snapshot.
- user updates the volume snapshot class to point to a correct one.
IMO it sounds like a reasonable mutation scenario?
If it's a mutable field, then it makes sense to me not to list this entry here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a valid snapshot was created, but then the user changes the class to something different?
I guess there's a similar scenario where a user could delete and create a volumesnapshotclass with the same name, but completely different settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO issue kubernetes-csi/external-snapshotter#362
…users can check if their cluster is okay to upgrade
/approve Will leave it to @xing-yang and @yuxiangqian for final review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall LGTM. Great work Andi!
keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
| CREATE | spec.Source | Exactly one of PersistentVolumeClaimName (Dynamic) or VolumeSnapshotContentName (Pre-provisioned) should be specified. | 400 | | ||
| UPDATE | spec.Source | Immutable, no updates allowed. If the user has specified an incorrect source, they must delete and remake the snapshot. The webhook validation server will not be able to guarantee that only incorrect sources are allowed to be updated. | 400 | | ||
| CREATE | spec.VolumeSnapshotClassName | Must not be the empty string. Can be unset (to use the default snapshot class, if it is set. If the default snapshot class is not set or there is more than 1 default class, then the hook will allow the creation but the snapshot will fail.), or set to a non-empty string (the snapshot class). | 400 | | ||
| UPDATE | spec.VolumeSnapshotClassName | Same restrictions as CREATE. We won’t restrict updating by making this field immutable (only applying the same restrictions as creation) but this field should only be changed by those who know exactly what they are doing. | 400 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one scenario was the following:
- user creates a VS which points to a non-existing volume snapshot class (fat-fingered or whatever reason), in this case the volume snapshot controller will fail the create a snapshot.
- user updates the volume snapshot class to point to a correct one.
IMO it sounds like a reasonable mutation scenario?
If it's a mutable field, then it makes sense to me not to list this entry here?
keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
Due to backwards compatibility concerns, the tightening will occur in three phases. | ||
|
||
1. The first phase is webhook-only, and will use [ratcheting validation](#backwards-compatibility). It will be the user's responsibility to clean up invalid objects which already existed before the webhook was enabled. Invalid objects are those which fail the new, stricter validation. The controller will not be able to automatically fix invalid objects, however it will apply a [label](#automatic-labelling-of-invalid-objects) to invalid objects so that users can easily locate them. | ||
2. The second phase can occur once all invalid objects are cleared from the cluster. It will be the cluster admin's responsibility to check and detect when it is safe to move to the second phase. The CRD schema validation will be tightened and the webhook will stick around to enforce immutability until immutable fields come to CRDs (Custom Resource Definition). This will be accompanied by a version change to make it clear the CRD is using different validation, however the storage version will be kept as `v1beta1` to ensure a [rollback](#rollback) is possible at phase 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What "version change" do you mean? Cut a different release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version change in the api of the object. So volume snapshot moves from v1beta1 to v1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify in the KEP that "version change" means snapshot API version moves from v1beta1 to v1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not quite a "move" at this point. We're just making a new api version available at this stage, and changing our controllers to use the new api version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What API version are you referring to? Is it "v1"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does phase 2 implies snapshot will go GA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworded it to show that in phase 2 there will be a new snapshot version (v1). The plan is for phase 2 to be v1, implying GA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not GA ready, potentially the version could be v1beta2.
keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md
Outdated
Show resolved
Hide resolved
LGTM |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AndiLi99, msau42, xing-yang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks Andi! |
Thank you everyone for all your help! Pleasure to work with you all. |
Create KEP extension for tightening validation on snapshot objects #1900
Some details are not fully fleshed out.
The release process has been outlined in order to tackle backwards compatibility concerns.
cc @xing-yang @msau42 @yuxiangqian @mattcary