-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enable drift detection + takeover experience in work applier [full DRAFT] #950
base: main
Are you sure you want to change the base?
feat: enable drift detection + takeover experience in work applier [full DRAFT] #950
Conversation
|
||
// If the Work object is not yet available, reconcile again in 5 seconds. | ||
if !isWorkObjectAvailable(work) { | ||
return ctrl.Result{RequeueAfter: availabilityCheckRequeueAfter}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this interval be tuned based on the type of resources or the progress?
identifier.Kind = manifestObj.GetKind() | ||
identifier.Name = manifestObj.GetName() | ||
identifier.Namespace = manifestObj.GetNamespace() | ||
identifier.GenerateName = manifestObj.GetGenerateName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this? AFAIK, this field and name are mutually exclusive
// | ||
// This check is done on the Work object scope, and is primarily added to address the case | ||
// where duplicate objects might appear in a Fleet resource envelope and lead to unexpected | ||
// behaviors. Duplication is a non-issue without Fleet resource envelopes, as the Fleet hub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This statement is not true since we apply override policies.
checked[wriStr] = true | ||
|
||
// Prepare the manifest conditions for the write-ahead process. | ||
manifestCondForWA := prepareManifestCondForWA(wriStr, bundle.id, work.Generation, existingManifestCondQIdx, work.Status.ManifestConditions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "WA" stand for?
bundle.availabilityResTyp = ManifestProcessingAvailabilityResultTypeFailed | ||
klog.ErrorS(err, | ||
"Failed to track the availability of the applied object in the member cluster", | ||
"work", *workRef, "GVR", *bundle.gvr, "inMemberClusterObj", klog.KObj(bundle.inMemberClusterObj)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why pass workRef as a pointer and then get the value of it?
) { | ||
// Firstly, attempt to find if an object has been created in the member cluster based on the manifest object. | ||
inMemberClusterObj, err := r.findInMemberClusterObjectFor(ctx, | ||
bundle.gvr, bundle.manifestObj, bundle.manifestCondIdx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is bundle.manifestCondIdx set? It's not set in preProcessManifests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably should have set it prepareManifestProcessingBundles?
inMemberClusterObjCopyJSONBytes, err := inMemberClusterObjCopy.MarshalJSON() | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to marshal the object from the member cluster into JSON: %w", err) | ||
} | ||
appliedObjJSONBytes, err := appliedObjCopy.MarshalJSON() | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to marshal the object returned by the dry-run apply op into JSON: %w", err) | ||
} | ||
|
||
// Compare the JSON representations. | ||
patch, err := jsondiff.CompareJSON(appliedObjJSONBytes, inMemberClusterObjCopyJSONBytes) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to compare the JSON representations of the the object from the member cluster and the object returned by the dry-run apply op: %w", err) | ||
} | ||
|
||
details, err := organizeJSONPatchIntoFleetPatchDetails(patch, appliedObjCopy.Object, inMemberClusterObj.Object) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to organize the JSON patch into Fleet patch details: %w", err) | ||
} | ||
return details, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we reuse this for full comparision?
klog.ErrorS(err, "Failed to decode the manifest", "ordinal", pieces, "work", klog.KObj(work)) | ||
bundle.applyErr = fmt.Errorf("failed to decode manifest: %w", err) | ||
bundle.applyResTyp = ManifestProcessingApplyResultTypeDecodingErred | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't return any error back to the caller, thus there are bundles that have no gvr/ManifestObj. However, those are used extensively as pointers in the rest part of the controller logic. It seems that this can lead to Nullptr panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that there are checks like "bundle.applyErr != nil" on some places but I am not sure if it covers all cases. Maybe we can add a step after preProcessManifests to remove those from the bundle?
// * Fleet reports that the manifest has been applied before (i.e., inMemberClusterObj exists); and | ||
// * The apply strategy dictates that an apply op should only run if there is no | ||
// detected drift; and | ||
// * The hash of the manifest object is consistently with that bookkept in the live object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// * The hash of the manifest object is consistently with that bookkept in the live object | |
// * The hash of the manifest object is not consistent with that book keeping annotation in the live object |
} | ||
|
||
inMemberClusterObjLastAppliedManifestObjHash := inMemberClusterObj.GetAnnotations()[fleetv1beta1.ManifestHashAnnotation] | ||
return manifestObjHash == inMemberClusterObjLastAppliedManifestObjHash, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why check if the manifest matches?
@@ -0,0 +1,490 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make sure that the previous test and integration test still works
Description of your changes
This PR includes a new implementation of the work applier that enables the drift detection + takeover experience.
Important:
Please note that this PR is submitted as a draft to provide a preview of the whole implementation. Though the flow has been fully implemented, there might still be incoming changes. The current focus is to provide a fully functional preview for interested parties; additional tests are currently being developed. When the code is ready, it will be submitted in smaller chunks. To make reviewing (and PR submissions) easier, the implementation is organized as a standalone controller that can replace the current work applier, rather than directly patching the existing code, even though codes are being reused as appropriate.
I have:
make reviewable
to ensure this PR is ready for review.How has this code been tested
N/A
Special notes for your reviewer