-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data source for volume provisioning #2495
Conversation
propose to add data source for volume operations including provisoning volume from snapshot, and cloning volumes
type PersistentVolumeSpec struct { | ||
// If specified, volume will be prepopulated with data from the PVCDataSourceRef. | ||
// +optional | ||
DataSourceRef *ypedLocalObjectReference `json:"dataSourceRef" protobuf:"bytes,2,opt,name=dataSourceRef"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be named DataSource to be consistent with line 18?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
// TypedLocalObjectReference contains enough information to let you locate the referenced object inside the same namespace. | ||
type TypedLocalObjectReference struct { | ||
// Name of the object reference. | ||
Name string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which line?
``` | ||
|
||
## Use cases | ||
* Use snapshot to backup data: Alice wants to take a snapshot of his Mongo database, and accidentally delete her tables, she wants to restore her volumes from the snapshot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/his/her
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
namespace: myns | ||
spec: | ||
source: | ||
Kind: PersistentVolumeClaim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lower case "kind"
spec: | ||
source: | ||
Kind: PersistentVolumeClaim | ||
Name: podpvc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lower case "name"
cc @thockin |
// If specified, volume was pre-populated with data from the specified data source. | ||
// +optional | ||
DataSource *ypedLocalObjectReference `json:"dataSourceRef" protobuf:"bytes,2,opt,name=dataSourceRef"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit : *TypedLocalObjectReference
missed T
kind: VolumeSnapshot | ||
metadata: | ||
name: snapshot-pd-1 | ||
namespace: myns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we change this and other uses of myns
to mynamespace
. I had a bit of trouble grokking that myns
was a shortened version of mynamespace
and think that might be a bit clearer .
## Design | ||
A new DataSource field is proposed to add to both PVC and PV to represent the source of the data which is prepopulated to the provisioned volume. If an external-provisioner does not understand the new DataSource field and cannot populate the data to the volume, PV/PVC controller should be able to detect that by comparing DataSource field in PV and PVC (i.e., PVC has DataSource but PV does not) and fail the operation. | ||
|
||
For DataSource, we propose to define a new type “TypedLocalObjectReference”. It is similar to “LocalObjectReference” type with additional Kind field in order to support multiple data source types. In the alpha version, this data source is restricted in the same namespace of the PVC. The following are the APIs we propose to add. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
support multiple data source types
does this mean DataSource
could support other sources outside of kubernetes such as Google Cloud buckets or AWS block storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, it is possible since data source allows users to specify different types. But in order to support it, we need to check whether it makes sense and see how to implement it. For example, if your data source is AWS block storage, it does not make sense to try to create a gce disk from this source.
cc @thockin |
Address design proposal: kubernetes/community#2495 This change add a condition called "VolumeDataPopulated" into PV. With the feature of snapshot and data source, volume might be created from snapshot source with data populated into the volume instead of an empty one. The external provision needs to understand the data source and also mark the condition "VolumeDataPopualted" to true. The PV controller will check this condition if the data source of PVC is specified. Only if the condition is true, the PVC can be marked as "Bound" phase.
// These are valid conditions of Pvc | ||
const ( | ||
// An user trigger resize of pvc has been started | ||
PersistentVolumeDataPopulated PersistentVolumeConditionType = "dataPopulated" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataPopulated
Type PersistentVolumeConditionType | ||
Status ConditionStatus | ||
// +optional | ||
LastProbeTime metav1.Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Will remove it.
### Error Handling | ||
If an external-provisioner does not understand the new DataSource field and cannot populate the data to the volume, we propose to add a PV condition to detect this situation and fail the operation. The idea is simlar to the design of ["Pod Ready++"](https://github.com/kubernetes/community/blob/master/keps/sig-network/0007-pod-ready%2B%2B.md) | ||
|
||
First we introduce a condition called "VolumeDataPopulated" into PV. With the feature of snapshot and data source, volume might be created from snapshot source with data populated into the volume instead of an empty one. The external provision needs to understand the data source and also mark the condition "VolumeDataPopualted" to true. The PV controller will check this condition if the data source of PVC is specified. Only if the condition is true, the PVC can be marked as "Bound" phase. In case of an old version of provisioner is in use and it cannot recognize the data source field in PVC, an empty Volume will be provisioned. In this situation, since the "VolumeDataPouplated" condition is not set to true by the provisioner, the PV controller will not mark PVC as "Bound" phase so that user cannot use this PVC yet. We can also add this condition into PVC to tell the users that the PVC's data is not populated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What ready++ did was make readiness a generic thing. You want to add the readinessGates[] field, and the "ready" signal is when all of the conditions listed in readinessGates are "True". Check @freehan's doc for details about validation required to disallow updates of readinessGates (I forget what we decided in the end).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is possible to make readiness for PV more generic. I think one difference between this PV condition readiness and pod ready++ design is that pod ready++ more focuses on user defined conditions while the PV condition proposed here (DataPopulated) is not the case. For pod, user can specify arbitrary conditions based on their specific needs. For PVC/PV, if user put data sources in their PVC spec, system should check and guarantee this condition by default. So far, we haven't seen the use cases of conditions defined by user side.
With current design, the change is very small (PR kubernetes/kubernetes@8046850) . pv controller will check DataPopulated condition as long as dataSource field is set without any user intervention. If following ready++ pattern, we need to make sure to add the condition by PVC is created (most likely in admission controller).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing I am saying is that this is an extension point. It should be extensible. In fact maybe the name of the condition doesn't even need to be codified at all. How does this feel?
- add
readinessGates []string
- provisioners that need to initialize data as a second step add a readiness gate for themselves
- provisioners that allocate and init in one step do not need anything
- PVCs are not ready until all of their readinessGates's Conditions are True
- same mutability / immutability rules as @freehan work :)
So the real questions I have:
-
Do we actually have provisioners that do 2-step initalization today which can't hide that fact (e.g. don't create the PV until it is done)?
-
Can we conceive of any hypothetically plausible other readiness gates for PVC that would use the same mechanism, or could it just be a well-understood bool?
@@ -46,18 +48,16 @@ type PersistentVolumeStatus struct { | |||
|
|||
type PersistentVolumeConditionType string | |||
|
|||
// These are valid conditions of Pvc | |||
// These are valid conditions of PersistentVolume | |||
const ( | |||
// An user trigger resize of pvc has been started |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PersistentVolume rather than pvc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this condition describes whether data is populated to a PersistentVolume.
The comments should be fixed though. This is not about "resize".
@@ -3,7 +3,11 @@ | |||
Note: this proposal is part of [Volume Snapshot](https://github.com/kubernetes/community/pull/2335) feature design, and also relevant to recently proposed Volume Clone feature. | |||
|
|||
## Goal | |||
In Volume Snapshot proposal, a snapshot is now represented as first-class CRD objects and an external snapshot controller is responsible for managing its lifecycle. With Snapshot API available, users could provision volumes from snapshot and data will be prepopulated to the volumes. Also considering clone and other possible storage operations, there could be many different types of sources used for populating the data to the volumes. In this proposal, we add a general "DataSource" which could be used to represent different types of data sources. | |||
Currently in Kuberentes, volume plugin only supports to provision an empty volume. With the new storage features (including [Volume Snapshot](https://github.com/kubernetes/community/pull/2335) and [volume clone](https://github.com/erinboyd/community/blob/patch-3/contributors/design-proposals/storage/cloning.md)) being proposed, there is a need to support data population for volume provisioning. For example, volume can be created from a snapshot source, or volume could be cloned from another volume source. Depending on the sources for creating the volume, there are two scenarios |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should refer to this PR for cloning instead: #2533
@@ -3,7 +3,11 @@ | |||
Note: this proposal is part of [Volume Snapshot](https://github.com/kubernetes/community/pull/2335) feature design, and also relevant to recently proposed Volume Clone feature. | |||
|
|||
## Goal | |||
In Volume Snapshot proposal, a snapshot is now represented as first-class CRD objects and an external snapshot controller is responsible for managing its lifecycle. With Snapshot API available, users could provision volumes from snapshot and data will be prepopulated to the volumes. Also considering clone and other possible storage operations, there could be many different types of sources used for populating the data to the volumes. In this proposal, we add a general "DataSource" which could be used to represent different types of data sources. | |||
Currently in Kuberentes, volume plugin only supports to provision an empty volume. With the new storage features (including [Volume Snapshot](https://github.com/kubernetes/community/pull/2335) and [volume clone](https://github.com/erinboyd/community/blob/patch-3/contributors/design-proposals/storage/cloning.md)) being proposed, there is a need to support data population for volume provisioning. For example, volume can be created from a snapshot source, or volume could be cloned from another volume source. Depending on the sources for creating the volume, there are two scenarios | |||
1. Volume provisioner can recoginize the source and be able to create the volume from the source directly (e.g., restore snapshot to a volume or clone volume). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/recoginize/recognize
// +optional | ||
Conditions []PersistentVolumeCondition | ||
} | ||
For other types of data sources that require external data populator, volume creation and data population are two seperate steps. Only when data is ready, PVC/PV can be marked as ready (Bound) so that users can start use them. We are working on a seperate proposal to address this using similar idea from ["Pod Ready++"](https://github.com/kubernetes/community/blob/master/keps/sig-network/0007-pod-ready%2B%2B.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/start use/start to use
Message string | ||
} | ||
``` | ||
Note: In order to use this data source feature, user/admin needs to update to the new external provisioner which can recognize snapshot data source. Otherwise, data source will be ignoed and an empty volume will be created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ignoed/ignored
Add a period at the end of the sentence.
/lgtm verify is failing for spelling. |
W0824 18:09:38.652] contributors/design-proposals/storage/data-source.md:38:114: "seperate" is a misspelling of "separate" |
@thockin, address the comments. PTAL. Thanks! |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md. Add DataSource and TypedLocalObjectReference **What this PR does / why we need it**: This PR adds TypedLocalObjectReference in the core API and adds DataSource in PersistentVolumeClaimSpec. It also enables feature gate for VolumeSnapshotDataSource. This is part of the CSI snapshot design proposal to support restoring a volume from a snapshot: kubernetes/community#2495 **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes # kubernetes/enhancements#177 **Special notes for your reviewer**: **Release note**: ```release-note Added support to restore a volume from a volume snapshot data source. ```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md. Add DataSource and TypedLocalObjectReference **What this PR does / why we need it**: This PR adds TypedLocalObjectReference in the core API and adds DataSource in PersistentVolumeClaimSpec. It also enables feature gate for VolumeSnapshotDataSource. This is part of the CSI snapshot design proposal to support restoring a volume from a snapshot: kubernetes/community#2495 **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes # kubernetes/enhancements#177 **Special notes for your reviewer**: **Release note**: ```release-note Added support to restore a volume from a volume snapshot data source. ``` Kubernetes-commit: d97ece0f36578b3af1c9ce275d184d711aa29baf
Add data source for volume provisioning
propose to add data source for volume operations including provisioning
volume from snapshot, and cloning volumes. This is part of the CSI snapshot proposal too #2335