-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve AWS & GCP volume AZ when snapshotting and restoring PVs. #102
Preserve AWS & GCP volume AZ when snapshotting and restoring PVs. #102
Conversation
Can one of the admins verify this patch? |
Looking for some initial feedback and suggestions on validating this change, esp on the GCP and Azure interface implementations. |
Thanks for the PR @ashish-amarnath! We'll take a look shortly and provide some feedback. |
It would be ideal to remove the availability zone in the AWSConfig altogether. As having a comma-separated list of strings may break assumptions that other areas of code may be making when using the |
A few initial thoughts on approach: -I think it'll be best to pull the AvailabilityZone for a PV from the Kubernetes API rather than from the cloud provider API. We can do this in volume_snapshot_action.go by looking up the appropriate label in the unstructured volume map. The reason for this is that for GCE, knowing the zone is required to make API calls (for example, to describe the volume), so we can't rely on making an API call to get the zone, we need to know it beforehand (i.e. from K8s metadata). Let me know if this approach seems reasonable. Happy to do more detailed review/guidance as well. cc @jrnt30 @ncdc - feel free to comment if you have other thoughts or disagree with any of my comments! |
@skriss I need to take a closer look at your first suggestion to understand the distinction there, but will let you know if I have any concerns. Otherwise the approach is inline with what I was thinking should be possible, thanks for the validation and the heads up on the GCP nuances. @ashish-amarnath If you get stuck or want to pair, I would be happy to spend some time on Zoom or Screen Hero chatting about this depending on schedule alignment. |
@skriss I've updated the PR to read the availability-zone info from the PV itself, instead of getting it from the cloud provider. This way we'll take a dependency on kubernetes' labelling of AZs instead of the cloud-provider APIs. |
Thanks @ashish-amarnath One thing we need to have you do is |
}, | ||
}, | ||
{ | ||
name: "aws - dynamically provisioned volume id", | ||
snapshotEnabled: true, | ||
pv: `{"apiVersion": "v1", "kind": "PersistentVolume", "metadata": {"name": "mypv"}, "spec": {"awsElasticBlockStore": {"volumeID": "aws://us-west-2a/vol-abc123"}}}`, | ||
pv: `{"apiVersion": "v1", "kind": "PersistentVolume", "metadata": {"name": "mypv", "labels": {"failure-domain.beta.kubernetes.io/zone": "us-east-1c"}}, "spec": {"awsElasticBlockStore": {"volumeID": "aws://us-west-2a/vol-abc123"}}}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a nit, but can we have the failureDomain
and the nested "AZ" from the volumeID
attributes match up to be more logically accurate please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Will do.
@@ -67,8 +67,8 @@ func NewSnapshotService(blockStorage BlockStorageAdapter) SnapshotService { | |||
} | |||
} | |||
|
|||
func (sr *snapshotService) CreateVolumeFromSnapshot(snapshotID string, volumeType string, iops *int64) (string, error) { | |||
volumeID, err := sr.blockStorage.CreateVolumeFromSnapshot(snapshotID, volumeType, iops) | |||
func (sr *snapshotService) CreateVolumeFromSnapshot(snapshotID string, volumeType string, volAZ string, iops *int64) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a nit... but you can have snapshotID, volumeType, volAZ string, iops *int64
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Didn't want to change the style of the original code. But, will change.
@@ -88,10 +73,10 @@ func NewBlockStorageAdapter(region, availabilityZone string) (cloudprovider.Bloc | |||
// from snapshot. | |||
var iopsVolumeTypes = sets.NewString("io1") | |||
|
|||
func (op *blockStorageAdapter) CreateVolumeFromSnapshot(snapshotID, volumeType string, iops *int64) (volumeID string, err error) { | |||
func (op *blockStorageAdapter) CreateVolumeFromSnapshot(snapshotID, volumeType, volAZ string, iops *int64) (volumeID string, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we just fully spell this out to volumeAZ
similar to volumeType
(everywhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
@@ -155,6 +155,30 @@ func TestVolumeSnapshotAction(t *testing.T) { | |||
pv: `{"apiVersion": "v1", "kind": "PersistentVolume", "metadata": {"name": "mypv"}, "spec": {"gcePersistentDisk": {"pdName": "pd-abc123"}}}`, | |||
expectError: true, | |||
}, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need these 2 additional test cases - how are they different from what's above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not different from the other test cases and are repetitive tests. Will remove them.
@@ -64,22 +60,11 @@ func NewBlockStorageAdapter(region, availabilityZone string) (cloudprovider.Bloc | |||
return nil, err | |||
} | |||
|
|||
// validate the availabilityZone | |||
var ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for a var
block any more - please move this down below into the blockStorageAdapter
assignment directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Will do.
pkg/apis/ark/v1/backup.go
Outdated
@@ -109,6 +109,10 @@ type VolumeBackupInfo struct { | |||
// API. | |||
Type string `json:"type"` | |||
|
|||
// AvailabilityZone is the where the volume is provisioned | |||
// in the cloud provider. | |||
AvailabilityZone string `json:"avaialabiltyZone"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small typo in the json tag -- should be json:"availabilityZone"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that!
pkg/backup/volume_snapshot_action.go
Outdated
@@ -59,6 +59,14 @@ func (a *volumeSnapshotAction) Execute(ctx ActionContext, volume map[string]inte | |||
|
|||
metadata := volume["metadata"].(map[string]interface{}) | |||
name := metadata["name"].(string) | |||
labels := metadata["labels"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some helpers for navigating these maps in pkg/util/collections. You can get the labels map with labelsMap, err := collections.GetMap(metadata, "labels")
. Unfortunately, you can't go straight to the zone right now because our helpers take dot-separated paths, which won't work given this label, but this should at least simplify getting the labels map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. That should simplify things a little bit.
I am not sure what I am doing wrong for the sign-off not to be picked up. Any suggestions on how to resolve the sign-off? |
I think the issue you have is there a mixture of signed/unsigned commits and a few that are repeated. The good news is that it's fixable, the bad news it that it's a bit of a pain :) This assumes that you have two remotes for your repo
This should be the pick-list you need to collapse the incorrect commits to remove the duplicate sets of commits.
If you do a Test your stuff and if that looks good NOTE: If you run into a problem, you should be able to |
I'm happy with the code. We do need to update the docs and sample files to remove AZ references from config. The main places are: examples/aws/00-ark-config.yaml But I would also look around the docs/ folder a bit more to see if there are any other references. |
@ashish-amarnath when doing the interactive rebase, you could also choose to squash everything down to 1 commit if that makes sense. Let us know if you need any more guidance here. Thanks! |
@ashish-amarnath when you have time, please rebase, squash down to 1 commit, and update docs as @skriss suggested. Thanks! |
@ashish-amarnath which platforms (AWS, GCP, Azure) have you been able to test this on? I can help out with some testing on any of those. |
pkg/apis/ark/v1/backup.go
Outdated
@@ -109,6 +109,10 @@ type VolumeBackupInfo struct { | |||
// API. | |||
Type string `json:"type"` | |||
|
|||
// AvailabilityZone is the where the volume is provisioned | |||
// in the cloud provider. | |||
AvailabilityZone string `json:"availabilityZone"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's an an omitempty
to the JSON tag (same as for Iops just below).
pkg/backup/volume_snapshot_action.go
Outdated
var pvfailureDomainZone string | ||
labelsMap, err := collections.GetMap(metadata, "labels") | ||
if err == nil { | ||
pvfailureDomainZone = labelsMap["failure-domain.beta.kubernetes.io/zone"].(string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to do a nil-check here when getting failure-domain.beta.kubernetes.io/zone
out of the map to handle the case where this label doesn't exist (currently it panics when it tries to convert to string). Can you also make sure each situation has an appropriate log statement (no labels collection, no zone key, or zone found)? Please also add a unit test for the "no zone label" scenario. Thanks!
@ashish-amarnath I was able to test this on all three platforms; two minor issues which I added comments for but otherwise it's looking good! |
- Read PV's AZ info from fault-domain label of the PV object for snapshotting. - Store PV's AZ info in the VolumeInfo. - Add tests for reading the label from the PV object. - Remove availability zone validation in AWS and GCP BlockStorageAdaptor. - Add volumeAZ as a parameter to methods in the BlockStorageAdapter interface. - Get AZ from VolumeInfo when restoring PV snapshot. - Remove references to PV availability zone in docs. Signed-off-by: Ashish Amarnath <[email protected]>
Signed-off-by: Ashish Amarnath <[email protected]>
if labelsMap["failure-domain.beta.kubernetes.io/zone"] != nil { | ||
pvfailureDomainZone = labelsMap["failure-domain.beta.kubernetes.io/zone"].(string) | ||
} else { | ||
ctx.log("error getting 'failure-domain.beta.kubernetes.io/zone' label on PersistentVolume %q for backup %q.\n", name, backupName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No \n
needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the log message to indicate that this label isn't present
ctx.log("error getting 'failure-domain.beta.kubernetes.io/zone' label on PersistentVolume %q for backup %q.\n", name, backupName) | ||
} | ||
} else { | ||
ctx.log("error getting labels on PersistentVolume %q for backup %q. ", name, backupName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the extra space at the end of the string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include the error via %v
@@ -60,6 +61,17 @@ func (a *volumeSnapshotAction) Execute(ctx ActionContext, volume map[string]inte | |||
|
|||
metadata := volume["metadata"].(map[string]interface{}) | |||
name := metadata["name"].(string) | |||
var pvfailureDomainZone string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capital F for Failure
re-tested on Azure & GCP; looks good |
ok to test |
good to merge. we'll tweak the log messages in a follow-up. @ashish-amarnath thanks again for the contribution!! |
@ashish-amarnath I addressed the requested changes from @ncdc so we're good here. Thanks! |
skip backuping projected volume
Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.8.0 to 1.8.1. - [Release notes](https://github.com/stretchr/testify/releases) - [Commits](stretchr/testify@v1.8.0...v1.8.1) --- updated-dependencies: - dependency-name: github.com/stretchr/testify dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Update VolumeInfo struct to store the AZ where the vol was provisioned. Update GetVolumeInfo method to get AZ information for the volume Lookup volume's AZ from VolumeInfo to use while restoring volume from it's snapshot.
Basic validation with change on heptio/ark: