Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve AWS & GCP volume AZ when snapshotting and restoring PVs. #102

Merged
merged 2 commits into from
Oct 10, 2017
Merged

Preserve AWS & GCP volume AZ when snapshotting and restoring PVs. #102

merged 2 commits into from
Oct 10, 2017

Conversation

ashish-amarnath
Copy link
Member

Update VolumeInfo struct to store the AZ where the vol was provisioned. Update GetVolumeInfo method to get AZ information for the volume Lookup volume's AZ from VolumeInfo to use while restoring volume from it's snapshot.

Basic validation with change on heptio/ark:

  • Compiles successfully
  • 'go test./...' all tests pass
  • 'make all' is successful

@heptibot
Copy link

Can one of the admins verify this patch?

@ashish-amarnath
Copy link
Member Author

Looking for some initial feedback and suggestions on validating this change, esp on the GCP and Azure interface implementations.

@skriss
Copy link
Contributor

skriss commented Sep 25, 2017

Thanks for the PR @ashish-amarnath! We'll take a look shortly and provide some feedback.

@ashish-amarnath
Copy link
Member Author

It would be ideal to remove the availability zone in the AWSConfig altogether. As having a comma-separated list of strings may break assumptions that other areas of code may be making when using the AvailabilityZone from the AWSConfig.

@skriss
Copy link
Contributor

skriss commented Sep 25, 2017

A few initial thoughts on approach:

-I think it'll be best to pull the AvailabilityZone for a PV from the Kubernetes API rather than from the cloud provider API. We can do this in volume_snapshot_action.go by looking up the appropriate label in the unstructured volume map. The reason for this is that for GCE, knowing the zone is required to make API calls (for example, to describe the volume), so we can't rely on making an API call to get the zone, we need to know it beforehand (i.e. from K8s metadata).
-storing the AZ in the VolumeBackupInfo struct (as you have it) makes sense.
-we should be able to remove AvailabilityZone and Zone from AWS and GCP configs, respectively, since they're only used for snapshot creation/restores. The validation step in the constructor for the BlockStorageAdapters is no longer necessary.
-we'll have to pass the AZ as an arg to a few additional SnapshotService methods to support GCP - looks like CreateVolumeFromSnapshot, GetVolumeInfo, IsVolumeReady, and CreateSnapshot at a glance
-Azure has just announced AZs in preview, so for now we don't need to add support

Let me know if this approach seems reasonable. Happy to do more detailed review/guidance as well.

cc @jrnt30 @ncdc - feel free to comment if you have other thoughts or disagree with any of my comments!

@jrnt30
Copy link
Contributor

jrnt30 commented Sep 26, 2017

@skriss I need to take a closer look at your first suggestion to understand the distinction there, but will let you know if I have any concerns. Otherwise the approach is inline with what I was thinking should be possible, thanks for the validation and the heads up on the GCP nuances.

@ashish-amarnath If you get stuck or want to pair, I would be happy to spend some time on Zoom or Screen Hero chatting about this depending on schedule alignment.

@ashish-amarnath
Copy link
Member Author

@skriss: I understand your suggestions and you make valid points. Let me take another stab at it and update the PR. I will not be able to work on this for the next day or so.
@jrnt30: I am not stuck but It would be great to pair. I will ping you on slack in the next day or so.

@ashish-amarnath
Copy link
Member Author

@skriss I've updated the PR to read the availability-zone info from the PV itself, instead of getting it from the cloud provider. This way we'll take a dependency on kubernetes' labelling of AZs instead of the cloud-provider APIs.

@jrnt30
Copy link
Contributor

jrnt30 commented Oct 3, 2017

Thanks @ashish-amarnath One thing we need to have you do is sign-off on the commits please. Instructions can be found at Contributing Docs

},
},
{
name: "aws - dynamically provisioned volume id",
snapshotEnabled: true,
pv: `{"apiVersion": "v1", "kind": "PersistentVolume", "metadata": {"name": "mypv"}, "spec": {"awsElasticBlockStore": {"volumeID": "aws://us-west-2a/vol-abc123"}}}`,
pv: `{"apiVersion": "v1", "kind": "PersistentVolume", "metadata": {"name": "mypv", "labels": {"failure-domain.beta.kubernetes.io/zone": "us-east-1c"}}, "spec": {"awsElasticBlockStore": {"volumeID": "aws://us-west-2a/vol-abc123"}}}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nit, but can we have the failureDomain and the nested "AZ" from the volumeID attributes match up to be more logically accurate please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Will do.

@@ -67,8 +67,8 @@ func NewSnapshotService(blockStorage BlockStorageAdapter) SnapshotService {
}
}

func (sr *snapshotService) CreateVolumeFromSnapshot(snapshotID string, volumeType string, iops *int64) (string, error) {
volumeID, err := sr.blockStorage.CreateVolumeFromSnapshot(snapshotID, volumeType, iops)
func (sr *snapshotService) CreateVolumeFromSnapshot(snapshotID string, volumeType string, volAZ string, iops *int64) (string, error) {
Copy link
Contributor

@jrnt30 jrnt30 Oct 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nit... but you can have snapshotID, volumeType, volAZ string, iops *int64 here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Didn't want to change the style of the original code. But, will change.

@@ -88,10 +73,10 @@ func NewBlockStorageAdapter(region, availabilityZone string) (cloudprovider.Bloc
// from snapshot.
var iopsVolumeTypes = sets.NewString("io1")

func (op *blockStorageAdapter) CreateVolumeFromSnapshot(snapshotID, volumeType string, iops *int64) (volumeID string, err error) {
func (op *blockStorageAdapter) CreateVolumeFromSnapshot(snapshotID, volumeType, volAZ string, iops *int64) (volumeID string, err error) {
Copy link
Contributor

@jrnt30 jrnt30 Oct 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we just fully spell this out to volumeAZ similar to volumeType (everywhere)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

@@ -155,6 +155,30 @@ func TestVolumeSnapshotAction(t *testing.T) {
pv: `{"apiVersion": "v1", "kind": "PersistentVolume", "metadata": {"name": "mypv"}, "spec": {"gcePersistentDisk": {"pdName": "pd-abc123"}}}`,
expectError: true,
},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these 2 additional test cases - how are they different from what's above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not different from the other test cases and are repetitive tests. Will remove them.

@@ -64,22 +60,11 @@ func NewBlockStorageAdapter(region, availabilityZone string) (cloudprovider.Bloc
return nil, err
}

// validate the availabilityZone
var (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for a var block any more - please move this down below into the blockStorageAdapter assignment directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Will do.

@@ -109,6 +109,10 @@ type VolumeBackupInfo struct {
// API.
Type string `json:"type"`

// AvailabilityZone is the where the volume is provisioned
// in the cloud provider.
AvailabilityZone string `json:"avaialabiltyZone"`
Copy link
Contributor

@skriss skriss Oct 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small typo in the json tag -- should be json:"availabilityZone"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that!

@@ -59,6 +59,14 @@ func (a *volumeSnapshotAction) Execute(ctx ActionContext, volume map[string]inte

metadata := volume["metadata"].(map[string]interface{})
name := metadata["name"].(string)
labels := metadata["labels"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some helpers for navigating these maps in pkg/util/collections. You can get the labels map with labelsMap, err := collections.GetMap(metadata, "labels"). Unfortunately, you can't go straight to the zone right now because our helpers take dot-separated paths, which won't work given this label, but this should at least simplify getting the labels map.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. That should simplify things a little bit.

@ashish-amarnath
Copy link
Member Author

I am not sure what I am doing wrong for the sign-off not to be picked up. Any suggestions on how to resolve the sign-off?

@jrnt30
Copy link
Contributor

jrnt30 commented Oct 4, 2017

I think the issue you have is there a mixture of signed/unsigned commits and a few that are repeated. The good news is that it's fixable, the bad news it that it's a bit of a pain :)

This assumes that you have two remotes for your repo origin that points to the Heptio Ark repo upstream and (asish-amarnath) current branch points to your own fork at ashish-amarnath.

git fetch --all 			# ensure origin/master is updated
git rebase -i origin/master # rewrite your local history to fix a few issues

This should be the pick-list you need to collapse the incorrect commits to remove the duplicate sets of commits.

# pick 48459dc WIP: Preserve AWS EBS volume AZ when snapshotting and restoring PVs.
# drop d9cbaf8 WIP: Preserve AWS EBS volume AZ when snapshotting and restoring PVs.
# pick 7d91f34 WIP: Read availability zone on PVs from PV fault-domain label
# pick b286455 WIP: Remove AvailabilityZone config from AWS and GCP configuration
# drop 805b315 WIP: Read availability zone on PVs from PV fault-domain label
# drop 82744ef WIP: Remove AvailabilityZone config from AWS and GCP configuration
# pick f5e8565 Address code review comments:

If you do a git log after that, you should see a bit cleaner git history in relation to your branch and origin/master.

Test your stuff and if that looks good git push --force ashish-amarnath and lets see what the sign-off check does then.

NOTE: If you run into a problem, you should be able to git pull to reset and try again

@skriss
Copy link
Contributor

skriss commented Oct 4, 2017

I'm happy with the code. We do need to update the docs and sample files to remove AZ references from config. The main places are:

examples/aws/00-ark-config.yaml
examples/gcp/00-ark-config.yaml
docs/config-definition.md

But I would also look around the docs/ folder a bit more to see if there are any other references.

@ncdc
Copy link
Contributor

ncdc commented Oct 4, 2017

@ashish-amarnath when doing the interactive rebase, you could also choose to squash everything down to 1 commit if that makes sense. Let us know if you need any more guidance here. Thanks!

@ashish-amarnath
Copy link
Member Author

@jrnt30 Thanks for the suggestions. I will follow it and reach out if I run into issues.
@skriss Will look at the docs to make changes.
@ncdc Squashing commits make sense.

@ncdc
Copy link
Contributor

ncdc commented Oct 5, 2017

@ashish-amarnath when you have time, please rebase, squash down to 1 commit, and update docs as @skriss suggested. Thanks!

@ashish-amarnath
Copy link
Member Author

@skriss, @ncdc, @jrnt30 This PR should be good to go.

@skriss
Copy link
Contributor

skriss commented Oct 6, 2017

@ashish-amarnath which platforms (AWS, GCP, Azure) have you been able to test this on? I can help out with some testing on any of those.

@@ -109,6 +109,10 @@ type VolumeBackupInfo struct {
// API.
Type string `json:"type"`

// AvailabilityZone is the where the volume is provisioned
// in the cloud provider.
AvailabilityZone string `json:"availabilityZone"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's an an omitempty to the JSON tag (same as for Iops just below).

var pvfailureDomainZone string
labelsMap, err := collections.GetMap(metadata, "labels")
if err == nil {
pvfailureDomainZone = labelsMap["failure-domain.beta.kubernetes.io/zone"].(string)
Copy link
Contributor

@skriss skriss Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to do a nil-check here when getting failure-domain.beta.kubernetes.io/zone out of the map to handle the case where this label doesn't exist (currently it panics when it tries to convert to string). Can you also make sure each situation has an appropriate log statement (no labels collection, no zone key, or zone found)? Please also add a unit test for the "no zone label" scenario. Thanks!

@skriss
Copy link
Contributor

skriss commented Oct 6, 2017

@ashish-amarnath I was able to test this on all three platforms; two minor issues which I added comments for but otherwise it's looking good!

- Read PV's AZ info from fault-domain label of the PV object for snapshotting.
- Store PV's AZ info in the VolumeInfo.
- Add tests for reading the label from the PV object.
- Remove availability zone validation in AWS and GCP BlockStorageAdaptor.
- Add volumeAZ as a parameter to methods in the BlockStorageAdapter interface.
- Get AZ from VolumeInfo when restoring PV snapshot.
- Remove references to PV availability zone in docs.

Signed-off-by: Ashish Amarnath <[email protected]>
if labelsMap["failure-domain.beta.kubernetes.io/zone"] != nil {
pvfailureDomainZone = labelsMap["failure-domain.beta.kubernetes.io/zone"].(string)
} else {
ctx.log("error getting 'failure-domain.beta.kubernetes.io/zone' label on PersistentVolume %q for backup %q.\n", name, backupName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No \n needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the log message to indicate that this label isn't present

ctx.log("error getting 'failure-domain.beta.kubernetes.io/zone' label on PersistentVolume %q for backup %q.\n", name, backupName)
}
} else {
ctx.log("error getting labels on PersistentVolume %q for backup %q. ", name, backupName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the extra space at the end of the string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include the error via %v

@@ -60,6 +61,17 @@ func (a *volumeSnapshotAction) Execute(ctx ActionContext, volume map[string]inte

metadata := volume["metadata"].(map[string]interface{})
name := metadata["name"].(string)
var pvfailureDomainZone string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capital F for Failure

@skriss
Copy link
Contributor

skriss commented Oct 10, 2017

re-tested on Azure & GCP; looks good

@skriss
Copy link
Contributor

skriss commented Oct 10, 2017

ok to test

@skriss
Copy link
Contributor

skriss commented Oct 10, 2017

good to merge. we'll tweak the log messages in a follow-up. @ashish-amarnath thanks again for the contribution!!

@skriss skriss changed the title WIP: Preserve AWS EBS volume AZ when snapshotting and restoring PVs. Preserve AWS & GCP volume AZ when snapshotting and restoring PVs. Oct 10, 2017
@skriss skriss merged commit d27b163 into vmware-tanzu:master Oct 10, 2017
@ncdc ncdc added this to the v0.5.0 milestone Oct 10, 2017
@ashish-amarnath
Copy link
Member Author

@skriss Thanks for the merge!
@ncdc If the logging changes that you requested didn't make it in, will follow this up with another PR just for the logging changes.

@skriss
Copy link
Contributor

skriss commented Oct 11, 2017

@ashish-amarnath I addressed the requested changes from @ncdc so we're good here. Thanks!

@ashish-amarnath ashish-amarnath deleted the support-multi-az-pvs branch October 24, 2017 21:47
jmontleon pushed a commit to jmontleon/velero that referenced this pull request Jul 7, 2021
alromeros pushed a commit to alromeros/velero that referenced this pull request Oct 25, 2024
Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.8.0 to 1.8.1.
- [Release notes](https://github.com/stretchr/testify/releases)
- [Commits](stretchr/testify@v1.8.0...v1.8.1)

---
updated-dependencies:
- dependency-name: github.com/stretchr/testify
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants