-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split Volume Populators KEP #2552
Conversation
d291b98
to
d188a64
Compare
d188a64
to
f814319
Compare
Create a new KEP to address to long-standing bug in the PVC admission controller. This KEP is related to the Volume Populators KEP but either KEP can stand alone. The primary reasons to have a separate KEP are to allow discussion of the specific bugfix, and to ensure that any fix clearly communicated in the release notes.
f814319
to
1681b07
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is effectively a bug fix.
/lgtm
/approve
|
||
## Table of Contents | ||
|
||
<!-- toc --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be missing PRR...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix this.
/assign @deads2k |
Having read the KEP, this appears to be a description of how fix a buggy/sub-par admission plugin. Today a non-functional pod is silently created and after this change the pod create will helpfully fail instead. PRR approved for stable |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bswartz, deads2k, saad-ali The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
### Risks and Mitigations | ||
|
||
The main risk is if a user had some preconfigured workflow that involved creation of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, the new failure mode is:
before:
- Does not have alpha AnyVolumeDataSource enabled
- Creates PVC with invalid source
- Source gets wiped
- Success
after this change:
- Does not have alpha AnyVolumeDataSource enabled
- Creates PVC with invalid source
- Error
So it is, strictly speaking, a very breaking change. I agree that is seems INCREDIBLY unlikely that anyone is depending on this, and yet it makes me anxious.
@liggitt is this an appropriate use for API warnings? Like, should we issue warnings for a release or two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your characterization is completely accurate. The only thing I'll add is that in the extremely unlike nightmare scenario you describe, the fix for an end-user is mind-blowingly simple -- just stop sending illegal data sources when creating PVCs. It's even harder to imagine the corner-case of a corner-case where someone both stumbles across this behavior change AND it takes then more than 10 seconds to work around the problem by updating their YAML.
Saad - I understand, but let's be empathetic. Having been on the receiving end of this it's not cool. We don't have to live with every mistake forever, but we do have to make reasonable efforts to avoid egregious harm. I'm inclined to agree that, in this specific case, the risk is pretty darn close to zero. But it's not zero. |
Ahh, yes, good point.
…On Tue, May 11, 2021 at 10:50 AM Ben Swartzlander ***@***.***> wrote:
Simple fixes do not justify breaking changes. Unfortunately. Life would be
easier if that was all it took :) If/when this goes in we should publish
(release note) a 'kubectl get' command that prints any instances of this.
I'm not sure what 'kube get' command we can publish. Part of the problem
is that any client behavior that would run afoul of this change is erased
before it lands in etcd, so there's no record the invalid data source was
ever proffered.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#2552 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVFX5LPBOYLN4EC4O33TNFU65ANCNFSM4YR4QATA>
.
|
so now on to the implications of stopping clearing this field: I anticipate three main sources for affected calls:
What is the current failure mode? A PVC that is broken (e.g. keeps pods from starting), or one that gets bound to an empty PV instead of the expected source? |
they ever put contents in the `DataSource` field) then it could go on unnoticed. Fixing | ||
this bug will cause that user's workflow to suddenly break. | ||
|
||
The workaround for the user would be trivial, they would just need to clear the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on where the request is coming from, it might be:
- trivial (manifest under the user's control)
- possible (programmatic client that needs to be updated)
- difficult (programmatic client not under the user's control that needs to be updated)
- impossible (already-persisted statefulset with a spec.volumeClaimTemplates with an invalid data source)
What is the recourse for users on the more difficult end of the scale? How much time do we give them to move? Do we need to make changes to things like statefulset update validation to allow fixing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even without this change, when the AnyVolumeDataSource feature goes beta, they will have this problem if they're using non-core objects (other than snapshots) as datasources. Those PVCs will be allowed to create but they will no longer bind. Those users would have to switch the feature gate off to get out of trouble if they want to continue to get empty volumes despite the non-empty data source field.
This PR extends the net to also catch PVCs with core object data sources, and to outright reject those because there isn't a future where they will become valid (here we presume that a failure to create a PVC is actually a more friendly error than a successfully created PVC that never binds). The only alternative I can see is perhaps to wrap up this behavior behind the same feature gate, but then we can't actually remove the DropDisabledFields() logic yet.
Currently, badly-formed PVC specs (containing invalid Part of the new Because core objects (other than PVCs) can NEVER be data sources for the purposes of volume populators, those are rejected by existing logic [2] in the admission controller. However, that logic is never hit because of the DropDisabledFields() [3] code which hides the invalid data source rather than allowing the admission controller to see them. Given that we’re changing the interpretation of the existing If we choose not to do this, then forever, there will be a special case for non-PVC core objects as the data sources of PVCs where we ignore the input and give you an empty volume in that strange core case only. [1] https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1495-volume-populators |
and both of those types of data sources are GA today, it seems impossible that users | ||
who wanted to clone a PVC or a VolumeSnapshot would be doing so incorrectly, because the | ||
fact that they were receiving an empty volume would make them fix their workflow. It's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the clone was intended to prime data as an optimization, and the app handled an empty volume gracefully by doing work from scratch, it seems very possible for this to be overlooked accidentally, and still be functional.
Discussing some options on slack, but frankly this is a mess. Summary for people playing along at home: Today:
Any change we make to that behavior is a potentially breaking change - a StatefulSet which was working (despite relying on the field to be wiped) will stop working. We suspect, without much evidence, that the risk of exposure is diminutive, but it is real. Now we're thinking about mitigations. |
@bswartz and I are brainstorming. Sadly, all of the options we have considered will require 1.22 to be a no-default-change release. Option 1: Mitigate breakageRoll 1.22 with two changes. First, stop allowing new bad values in StatefulSet PVCTemplate DataSources (breaking). Second, add a scrubber-controller which wipes StatefulSet PVCTemplate DataSources that are not snapshot or PVC (which will be wiped when the PVC is instantiated). Roll that out. This shifts the failure from StatefulSet controller's PVC creation time to StatefulSet creation time, which is more likely (maybe?) to be a human. We could even allow the value but ignore it, and send back an API warning that it is being ignored. Then in 1.23 we know that all stored PVCs and StatefulSets are either valid or empty, so we can add and preserv`e new values. This is STILL a breaking change for anyone who depends on the wipe, but we reduce the scope to (hopefully) just first-parties. We know some users do make their own workload APIs which layer on StatefulSet and/or PVC, but this seems to drive the risk of error closer to 0 (since there's not actually much reason to use this DataSource field in a StatefulSet). Pro: Cleanest Option 2: Opt-inAdd a new field like Pro: Should not be a breaking change for anyone Option 3: Reboot the featureMake a new Pro: Cleaner, opt-in |
All 3 of these options delay the eventually availability of populators another release, so that's not great from my perspective... Regarding option 1, I do like that it ends up with a clean API. There's a problem though. If someone wants to enable the AnyVolumeDataSource feature gate, then the scrubber/admission controller behavior would conflict with it. We'd need to make those new things be tied to the feature gate being off, such that when the feature gate turned on, they'd turn off. Regarding option 2, I like how simple and straightforward it is. I wonder why we would wait a whole release just to put this one new field in alpha. Could we tie it to the feature gate and make it beta right away? Having to carry this new "bandaid" field around forever is kind of gross, but honestly it seems like the least-bad option to me, if risk avoidance is paramount. I don't like the specific name though. I'd propose the following 5 alternatives:
Option 3 strikes me as a straightforward way to admit to a mistake and try to do it over. Long term it's also cleaner, but in the medium term lots of clients have to respond to the deprecation. I'm sure there are already many users of PVC and snapshot cloning today, and every one of them will be faced with changing their yaml or code or tolerating warnings. I could live with this, but option 2 appeals to me more. |
Because of API server skew in HA clusters. Until the n-1 API server has the field, any write to the object via an API server on a prior version will drop the data from etcd. That's why new fields added to GA types always have a release to roll out without permitting data to be added to them, to ensure we don't persist data and then drop it via another write. |
Thanks for this explanation! |
Trying this on for size: OK, lemme try this on for size. Go with option 3 (admit failure, add a new field). Spec is already mostly immutable, so we don't need to worry about updates. On create: Let "old" mean
Legacy clients will hit either 1 or 2 or 5 Modern clients should use 7 Data populators always use the new field and ignore the old field. Does that hold water? |
Thanks for this, it's a good framework to think about it. I immediately have two questions: What about existing PVCs already in etcd? Can we execute (2) as some kind of upgrade step? Because the "user" of the data source field is an out-of-tree component (external provisioner sidecar) we will have to update that sidecar to consume the new field. We can specify that new sidecar versions depend on k8s v1.22, but we MUST continue to support old sidecars which will continue to consume the "old" field. This makes (7) impossible to implement. Can we change (7) to do the inverse of what (2) does in order to support old external provisioner sidecars? |
Okay now that I'm more awake....the coupling between existing provisioner sidecars and the "old" field means that the only way to stop using the old field is to ensure that provisioners get upgraded at some point. We can officially deprecate the old field and immediately (in the next release) update the provisioner sidecar to look at both fields and prefer to use the "new" field if present. We would still need to change (7) to copy the data source to the "old" field if it was a PVC or snapshot (the only things that were ever valid) for backwards compatibility with old sidecars. Then we could set a clock for actual removal of the old field (something like 4-6 releases later, probably) and require that by that time CSI plugins have to have upgraded to the new sidecar. If when that clock expires in the future, and we actually stop copying contents from the "new" field to the "old" field, if there's still an old sidecar running at that time, it just breaks. I'm not sure if deprecation allows us to get away with this kind of breakage. If not, then the only choice would be to continue filling in the old field for PVCs and snapshots forever, and the old field never really goes away. |
We can set the "new" values on the fly on read operations.
This is a good point. All of my carefully orchestrated chaos above is for nought if clients still have to consume both "old" and "new", which they certainly do for now. The question is whether we want to give them a path out of it or just leave it as a well-documented bump. I have a slight lean towards a "path out", but I admit it's net-new work (not super hard work, but still). Within a year or two, clients can assume the "new" is valid and pretend the "old" field doesn't exist.
You are right. We could modify it.
Actually, no. We can't remove it from a GA API. We can undocument it and call it deprecated, but it has to stay and keep working. But that's not a huge deal, I think. |
Okay so option 3 could be described at a high level thusly: I think this compares unfavorably with option 2, which can be summarized like this: One new boolean field which only affects future data sources seems less ugly than duplicating an entire struct which affects the only 2 currently supported data sources today. |
That's mostly fair. The rebuttal in defense of 3 is: Who cares how it looks on readback? There are always fields people don't care about. Option 3 gives the cleanest result over the long-term for users who just want to use the API. At this point, I am happy enough with either option 2 or 3, and despite my preference, it's your API. Between you and the sig - it's your call. @msau42 |
I'd like @saad-ali's opinion too between options 2 and 3. With either option, we need to name one new field. In both cases I prefer the name |
My choice of |
Expect another update to KEP 1495 shortly (maybe 1-2 hours). After/if that update is approved I will close this PR and abandon this KEP. |
I sync'd with @bswartz offline on this.
Requires delaying moving this feature forward for another couple of releases, which is not ideal.
This means to use a new data source you have to forever set a random boolean flag -- that would be an ugly user experience longer term.
I do think it is the cleanest design. This is something we considered in the past but rejected because it would cause inconsistency with Snapshot and Cloning. The other major drawback of this approach is it still leaves the existing "Datasource trap door" wide open. I can imagine someone naively sticking the value of DataSourceRef in to DataSource and the controllers silently swallowing this PVC and creating an empty PVC. That said, I understand the arguments for backwards compatibility as well which require us to maintain this behavior. We can minimize it through documentation, comments, and deprecation notices. Conclusion: I am fine with option 3. |
ARGH!!! In the act of rewriting the orignal KEP to use option 3, I remembered the other huge problem we've been trying to avoid from the beginning. In @thockin's example above, case 8 says:
This is the case that all new volume populators would use. The problem here is that existing dynamic provisioners (ones that don't look at the new |
@saad-ali and @thockin I've rewritten the original KEP to include option 3. That PR is here: #2738 This means remaining alpha in 1.22. I did not resolve the last-minute issue mentioned above (yet). Please review the KEP, and if you have a suggestion for how to prevent existing dynamic provisioners from thinking they're supposed to generate empty volumes in response to new clients using the new API, please let me know. My current thinking is that putting any unrecognized value into the existing |
Interesting point. So we revise it again to:
Legacy clients will hit either 1 or 2 or 5 (including StatefulSet with stored invalid data) Modern clients should use 7 or 8 Data populators can always read the new field and ignore the old field Back-rev populators will read the old field Back-rev provisioners will read the old field, and in case 8 they will see a dataSource there ? |
Yes I agree. Oh and I like the idea to literally use the value that was specified instead of some sentinel value. I will update KEP. |
Further discussion of this issue will continue on #2738 |
Create a new KEP to address to long-standing bug in the PVC admission
controller. This KEP is related to the Volume Populators KEP but either
KEP can stand alone.
The primary reasons to have a separate KEP are to allow discussion of the
specific bugfix, and to ensure that any fix clearly communicated in the
release notes.