-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ItemBlock model and phase 1 (single-thread) workflow changes #8102
ItemBlock model and phase 1 (single-thread) workflow changes #8102
Conversation
e16ef18
to
67c30b7
Compare
Converting to Draft for now until:
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8102 +/- ##
==========================================
+ Coverage 59.09% 59.23% +0.14%
==========================================
Files 364 366 +2
Lines 30340 30520 +180
==========================================
+ Hits 17929 18080 +151
- Misses 10968 10986 +18
- Partials 1443 1454 +11 ☔ View full report in Codecov by Sentry. |
e28c959
to
68cee28
Compare
68cee28
to
3081b3d
Compare
3081b3d
to
b5db272
Compare
pkg/backup/backup.go
Outdated
itemBlock.Log.WithError(err).WithField("name", item.Item.GetName()).Error("Error running hooks for pod") | ||
// if pre hook fails, flag pod as backed-up and move on | ||
itemBlock.itemBackupper.backupRequest.BackedUpItems[key] = struct{}{} | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we skip the pod, what about the related items in the same itemBlock? i.e., PVCs, secects, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pod hook API's "fail on error" only calls for the pod itself to fail. Current Velero logic won't skip PVC, secrets, etc. either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. We do see some problems here though they also happen with the existing code:
- The hook may be used to quiescence the application or prepare the backup data to the volume
- When the hook fails and the pod is skipped, but the PVC may be still backed up
- The consequence is that users either get a corrupted backup data or empty backup data
Do you think we can enhance this in future based on the new itemBlock implementation? Or do you think it is even harder to fix it with the new implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lyndon-Li I think we should defer this to the future, since we'll need to think carefully about the design. Do we always want to not restore items in the block that were returned by any IBA plugins for the pod if the pre-hook fails, or should this be part of the IBA plugin return value? If it's plugin-decided rather than optional, then we'll need to modify the IBA plugin return (with a v2 plugin). If we decide that we always want to skip related items for a pod if a hook fails, then we won't need that. But also, the model here doesn't track which item's IBA plugin (if any) was responsible for pulling the item into the block, so we'd need to add that too.
Also, this could introduce as many problems as it solves. If there are 2 pods in the block, either because of a shared RWX volume, or (in the future) due to the pods having volumes in the same VolumeGroup, then should one failed pod pre-hook cause all of the pods to be skipped in the backup? I think this should be a separate design discussion post-1.15.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be sufficient to just fail the pod -- the user will see the PartiallyFailed backup, note that the error was on a stateful pod and therefore assume the backup is bad anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree that if we decide to make a fix here in the future, it will be easier to do with the ItemBlock implementation than without.
itemGV, err := schema.ParseGroupVersion(item.Item.GetAPIVersion()) | ||
if err == nil && item.PreferredGVR.GroupVersion() == itemGV { | ||
returnList = append(returnList, item) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic does not look right.
The error branch will append the item even if the error was reported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blackpiglet this func returns a list of items matching the GR (could be more than one if EnableAPIGroupVersions is set). The purpose of the PreferredGVR comparison is that we want to return the item corresponding to the preferred GVR first, followed by the rest. So we loop over the list and when we find an item that's the preferred GVR, we add it to returnList, else we add to itemList -- then at the end, we append itemList to preferredList. So after filtering out the non-matching items (first if
), there are 3 possibilities:
- We parse preferredGVR, compare it to item GVR and it's a match, so put the item at the front of the list (if branch)
- We parse preferredGVR, compare it to item GVR and it's not a match, so put the item in the "other items" list (else branch)
- We fail to parse preferredGVR, so we just put the item in the "other items" list (else branch).
Basically if we ever fail to parse preferred GVR, then we just return the list of items in the order we have them. If we parse it and there's a match (there should be), then that one goes first. We really don't expect this to ever fail, since the preferred GVR string was determined by the item collector, so there's no reason it would fail parsing -- but since ParseGroupVersion has the possibility of returning an error, we need to handle that unlikely case -- but we don't want to error out if it happened, since it would be a non-fatal error. Maybe I should add a debug log here in the case that error is not nil and separate that if
from the comparison test to make the logic clearer, though.
"namespace": namespace, | ||
}) | ||
|
||
func (ib *itemBackupper) itemInclusionChecks(log logrus.FieldLogger, mustInclude bool, metadata metav1.Object, obj runtime.Unstructured, groupResource schema.GroupResource) bool { | ||
if mustInclude { | ||
log.Infof("Skipping the exclusion checks for this resource") | ||
} else { | ||
if metadata.GetLabels()[velerov1api.ExcludeFromBackupLabel] == "true" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to check the GroupResource here.
Only the PVC and PV need to go through the PV skip tracking logic.
I know it's legacy code, but the change could save some condition checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. So the tracking logic is only needed for PV/PC, but the exclusion checks overall apply to all types. i.e. any item is excludable, but the skipped PV tracking is only relevant for PVC/PV -- but that func already checks inside it. ExcludeFromBackupLabel, excluded namespace or resource type, etc. All of those checks are needed for even non-PV/PVC resources.
Maybe it's better to add a future enhancement, but it's better to know how the backed-up resources are organized with the ItemBlock, for example it can be displayed in the |
@blackpiglet Hmm. Yeah, we could definitely do that as a future enhancement. I guess we'd need to generate some sort of yaml file to upload to object storage -- basically when we generate an ItemBlock, before passing it on to start backup we could append yaml metadata, and at the end, we could upload this yaml to object store. I think it's fine to do in the future, since it won't really affect much of this code and we're too late in the 1.15 cycle for new design, etc. |
Signed-off-by: Scott Seago <[email protected]>
b5db272
to
9d6f4d2
Compare
Updated PR based on feedback. |
Thank you for contributing to Velero!
Please add a summary of your change
This PR adds the ItemBlock model and introduces the workflow changes for (single-threaded) ItemBlock functionality. ItemBlocks are created based on:
ItemBlocks are then backed up together using the new kubernetesBackupper.backupItemBlock func which
This change does not yet introduce worker threads to back up ItemBlocks in parallel -- that is out of scope for Velero 1.15
Does your change fix a particular issue?
Third PR for #7900
Please indicate you've done the following:
/kind changelog-not-required
as a comment on this pull request.site/content/docs/main
.