Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data source for volume provisioning #2495

Merged
merged 12 commits into from
Aug 25, 2018

Conversation

jingxu97
Copy link
Contributor

@jingxu97 jingxu97 commented Aug 7, 2018

propose to add data source for volume operations including provisioning
volume from snapshot, and cloning volumes. This is part of the CSI snapshot proposal too #2335

propose to add data source for volume operations including provisoning
volume from snapshot, and cloning volumes
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/design Categorizes issue or PR as related to design. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Aug 7, 2018
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 7, 2018
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 7, 2018
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 9, 2018
type PersistentVolumeSpec struct {
// If specified, volume will be prepopulated with data from the PVCDataSourceRef.
// +optional
DataSourceRef *ypedLocalObjectReference `json:"dataSourceRef" protobuf:"bytes,2,opt,name=dataSourceRef"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be named DataSource to be consistent with line 18?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

// TypedLocalObjectReference contains enough information to let you locate the referenced object inside the same namespace.
type TypedLocalObjectReference struct {
// Name of the object reference.
Name string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which line?

```

## Use cases
* Use snapshot to backup data: Alice wants to take a snapshot of his Mongo database, and accidentally delete her tables, she wants to restore her volumes from the snapshot.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/his/her

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

namespace: myns
spec:
source:
Kind: PersistentVolumeClaim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower case "kind"

spec:
source:
Kind: PersistentVolumeClaim
Name: podpvc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower case "name"

@jingxu97
Copy link
Contributor Author

cc @thockin

// If specified, volume was pre-populated with data from the specified data source.
// +optional
DataSource *ypedLocalObjectReference `json:"dataSourceRef" protobuf:"bytes,2,opt,name=dataSourceRef"`
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : *TypedLocalObjectReference missed T

kind: VolumeSnapshot
metadata:
name: snapshot-pd-1
namespace: myns

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change this and other uses of myns to mynamespace. I had a bit of trouble grokking that myns was a shortened version of mynamespace and think that might be a bit clearer .

## Design
A new DataSource field is proposed to add to both PVC and PV to represent the source of the data which is prepopulated to the provisioned volume. If an external-provisioner does not understand the new DataSource field and cannot populate the data to the volume, PV/PVC controller should be able to detect that by comparing DataSource field in PV and PVC (i.e., PVC has DataSource but PV does not) and fail the operation.

For DataSource, we propose to define a new type “TypedLocalObjectReference”. It is similar to “LocalObjectReference” type with additional Kind field in order to support multiple data source types. In the alpha version, this data source is restricted in the same namespace of the PVC. The following are the APIs we propose to add.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support multiple data source types does this mean DataSource could support other sources outside of kubernetes such as Google Cloud buckets or AWS block storage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, it is possible since data source allows users to specify different types. But in order to support it, we need to check whether it makes sense and see how to implement it. For example, if your data source is AWS block storage, it does not make sense to try to create a gce disk from this source.

@jingxu97
Copy link
Contributor Author

cc @thockin

jingxu97 added a commit to jingxu97/kubernetes that referenced this pull request Aug 20, 2018
Address design proposal: kubernetes/community#2495

This change add a condition called "VolumeDataPopulated" into PV. With
the feature of snapshot and data source, volume might be created from snapshot source with data
populated into the volume instead of an empty one. The external
provision needs to understand the data source and also mark the
condition "VolumeDataPopualted" to true. The PV controller will check this condition if the data source of PVC is specified. Only if the condition is true, the PVC can be marked as "Bound" phase.
// These are valid conditions of Pvc
const (
// An user trigger resize of pvc has been started
PersistentVolumeDataPopulated PersistentVolumeConditionType = "dataPopulated"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataPopulated

Type PersistentVolumeConditionType
Status ConditionStatus
// +optional
LastProbeTime metav1.Time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Will remove it.

### Error Handling
If an external-provisioner does not understand the new DataSource field and cannot populate the data to the volume, we propose to add a PV condition to detect this situation and fail the operation. The idea is simlar to the design of ["Pod Ready++"](https://github.com/kubernetes/community/blob/master/keps/sig-network/0007-pod-ready%2B%2B.md)

First we introduce a condition called "VolumeDataPopulated" into PV. With the feature of snapshot and data source, volume might be created from snapshot source with data populated into the volume instead of an empty one. The external provision needs to understand the data source and also mark the condition "VolumeDataPopualted" to true. The PV controller will check this condition if the data source of PVC is specified. Only if the condition is true, the PVC can be marked as "Bound" phase. In case of an old version of provisioner is in use and it cannot recognize the data source field in PVC, an empty Volume will be provisioned. In this situation, since the "VolumeDataPouplated" condition is not set to true by the provisioner, the PV controller will not mark PVC as "Bound" phase so that user cannot use this PVC yet. We can also add this condition into PVC to tell the users that the PVC's data is not populated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What ready++ did was make readiness a generic thing. You want to add the readinessGates[] field, and the "ready" signal is when all of the conditions listed in readinessGates are "True". Check @freehan's doc for details about validation required to disallow updates of readinessGates (I forget what we decided in the end).

Copy link
Contributor Author

@jingxu97 jingxu97 Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is possible to make readiness for PV more generic. I think one difference between this PV condition readiness and pod ready++ design is that pod ready++ more focuses on user defined conditions while the PV condition proposed here (DataPopulated) is not the case. For pod, user can specify arbitrary conditions based on their specific needs. For PVC/PV, if user put data sources in their PVC spec, system should check and guarantee this condition by default. So far, we haven't seen the use cases of conditions defined by user side.

With current design, the change is very small (PR kubernetes/kubernetes@8046850) . pv controller will check DataPopulated condition as long as dataSource field is set without any user intervention. If following ready++ pattern, we need to make sure to add the condition by PVC is created (most likely in admission controller).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing I am saying is that this is an extension point. It should be extensible. In fact maybe the name of the condition doesn't even need to be codified at all. How does this feel?

  • add readinessGates []string
  • provisioners that need to initialize data as a second step add a readiness gate for themselves
  • provisioners that allocate and init in one step do not need anything
  • PVCs are not ready until all of their readinessGates's Conditions are True
  • same mutability / immutability rules as @freehan work :)

So the real questions I have:

  1. Do we actually have provisioners that do 2-step initalization today which can't hide that fact (e.g. don't create the PV until it is done)?

  2. Can we conceive of any hypothetically plausible other readiness gates for PVC that would use the same mechanism, or could it just be a well-understood bool?

@@ -46,18 +48,16 @@ type PersistentVolumeStatus struct {

type PersistentVolumeConditionType string

// These are valid conditions of Pvc
// These are valid conditions of PersistentVolume
const (
// An user trigger resize of pvc has been started

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PersistentVolume rather than pvc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this condition describes whether data is populated to a PersistentVolume.

The comments should be fixed though. This is not about "resize".

@jingxu97 jingxu97 changed the title Add data source for volume operations Add data source for volume provisioning Aug 23, 2018
@@ -3,7 +3,11 @@
Note: this proposal is part of [Volume Snapshot](https://github.com/kubernetes/community/pull/2335) feature design, and also relevant to recently proposed Volume Clone feature.

## Goal
In Volume Snapshot proposal, a snapshot is now represented as first-class CRD objects and an external snapshot controller is responsible for managing its lifecycle. With Snapshot API available, users could provision volumes from snapshot and data will be prepopulated to the volumes. Also considering clone and other possible storage operations, there could be many different types of sources used for populating the data to the volumes. In this proposal, we add a general "DataSource" which could be used to represent different types of data sources.
Currently in Kuberentes, volume plugin only supports to provision an empty volume. With the new storage features (including [Volume Snapshot](https://github.com/kubernetes/community/pull/2335) and [volume clone](https://github.com/erinboyd/community/blob/patch-3/contributors/design-proposals/storage/cloning.md)) being proposed, there is a need to support data population for volume provisioning. For example, volume can be created from a snapshot source, or volume could be cloned from another volume source. Depending on the sources for creating the volume, there are two scenarios
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should refer to this PR for cloning instead: #2533

@@ -3,7 +3,11 @@
Note: this proposal is part of [Volume Snapshot](https://github.com/kubernetes/community/pull/2335) feature design, and also relevant to recently proposed Volume Clone feature.

## Goal
In Volume Snapshot proposal, a snapshot is now represented as first-class CRD objects and an external snapshot controller is responsible for managing its lifecycle. With Snapshot API available, users could provision volumes from snapshot and data will be prepopulated to the volumes. Also considering clone and other possible storage operations, there could be many different types of sources used for populating the data to the volumes. In this proposal, we add a general "DataSource" which could be used to represent different types of data sources.
Currently in Kuberentes, volume plugin only supports to provision an empty volume. With the new storage features (including [Volume Snapshot](https://github.com/kubernetes/community/pull/2335) and [volume clone](https://github.com/erinboyd/community/blob/patch-3/contributors/design-proposals/storage/cloning.md)) being proposed, there is a need to support data population for volume provisioning. For example, volume can be created from a snapshot source, or volume could be cloned from another volume source. Depending on the sources for creating the volume, there are two scenarios
1. Volume provisioner can recoginize the source and be able to create the volume from the source directly (e.g., restore snapshot to a volume or clone volume).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/recoginize/recognize

// +optional
Conditions []PersistentVolumeCondition
}
For other types of data sources that require external data populator, volume creation and data population are two seperate steps. Only when data is ready, PVC/PV can be marked as ready (Bound) so that users can start use them. We are working on a seperate proposal to address this using similar idea from ["Pod Ready++"](https://github.com/kubernetes/community/blob/master/keps/sig-network/0007-pod-ready%2B%2B.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/start use/start to use

Message string
}
```
Note: In order to use this data source feature, user/admin needs to update to the new external provisioner which can recognize snapshot data source. Otherwise, data source will be ignoed and an empty volume will be created
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ignoed/ignored

Add a period at the end of the sentence.

@thockin
Copy link
Member

thockin commented Aug 24, 2018

/lgtm
/approve

verify is failing for spelling.

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 24, 2018
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 24, 2018
@thockin
Copy link
Member

thockin commented Aug 24, 2018

W0824 18:09:38.652] contributors/design-proposals/storage/data-source.md:38:114: "seperate" is a misspelling of "separate"
W0824 18:09:38.653] contributors/design-proposals/storage/data-source.md:38:250: "seperate" is a misspelling of "separate"

@jingxu97
Copy link
Contributor Author

@thockin, address the comments. PTAL. Thanks!

@thockin
Copy link
Member

thockin commented Aug 25, 2018

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 25, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 0741bdf into kubernetes:master Aug 25, 2018
k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this pull request Aug 29, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Add DataSource and TypedLocalObjectReference

**What this PR does / why we need it**:
This PR adds TypedLocalObjectReference in the core API and adds DataSource in PersistentVolumeClaimSpec.

It also enables feature gate for VolumeSnapshotDataSource.

This is part of the CSI snapshot design proposal to support restoring a volume from a snapshot: 
kubernetes/community#2495

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
kubernetes/enhancements#177

**Special notes for your reviewer**:

**Release note**:

```release-note
Added support to restore a volume from a volume snapshot data source. 
```
k8s-publishing-bot added a commit to kubernetes/api that referenced this pull request Aug 29, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Add DataSource and TypedLocalObjectReference

**What this PR does / why we need it**:
This PR adds TypedLocalObjectReference in the core API and adds DataSource in PersistentVolumeClaimSpec.

It also enables feature gate for VolumeSnapshotDataSource.

This is part of the CSI snapshot design proposal to support restoring a volume from a snapshot: 
kubernetes/community#2495

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
kubernetes/enhancements#177

**Special notes for your reviewer**:

**Release note**:

```release-note
Added support to restore a volume from a volume snapshot data source. 
```

Kubernetes-commit: d97ece0f36578b3af1c9ce275d184d711aa29baf
MadhavJivrajani pushed a commit to MadhavJivrajani/community that referenced this pull request Nov 30, 2021
Add data source for volume provisioning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants