-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance BuildRun reconciles failure scenarios #641
Enhance BuildRun reconciles failure scenarios #641
Conversation
f96dfcf
to
dc7bb3b
Compare
9b5c03b
to
6ade3c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress. Did a first run through it.
pkg/reconciler/buildrun/buildrun.go
Outdated
updateErr := r.updateBuildRunErrorStatus(ctx, buildRun, err.Error()) | ||
return nil, resources.HandleError("Failed to choose a service account to use", err, updateErr) | ||
func (r *ReconcileBuildRun) getReferencedStrategy(ctx context.Context, build *buildv1alpha1.Build, buildRun *buildv1alpha1.BuildRun) (strategy buildv1alpha1.BuilderStrategy, err error) { | ||
if build.Spec.StrategyRef.Kind == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the old code assumed a namespaced build strategy when no kind was set. I like to make it mandatory. But instead of code-validation, I prefer schema validation and therefore propose to change the type of Kind
in StrategyRef
from *BuildStrategyKind
to BuildStrategyKind
(in buildstrategy.go).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code assumes we dont default to anything but rather fail, I think this comment goes into the direction of #657, which as you could see have no consensus yet. So we might also tackle this one via that issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using 657 sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After 657 discussions and also keeping in mind this was using a default before, I added one more commit that inlines with the new logic and that supports a default, see 796f5ae
updateErr := r.updateBuildRunErrorStatus(ctx, buildRun, err.Error()) | ||
return nil, resources.HandleError("failed to set OwnerReference for BuildRun and TaskRun", err, updateErr) | ||
default: | ||
err = fmt.Errorf("unknown strategy %s", string(*build.Spec.StrategyRef.Kind)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to handle this with schema validation as well ? OpenAPI supports enums. We currently define BuildStrategyKind like this in buildstrategy.go:
type BuildStrategyKind string
In typescript I would simply write:
type BuildStrategyKind = 'BuildStrategy' | 'ClusterBuildStrategy'
Not sure if something like this is possible in go and supported by the kube typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. Im totally fine with the way it is, it actually allowed me to simplify this code into a switch clause.
} | ||
ctxlog.Info(ctx, "created serviceAccount for BuildRun", namespace, buildRun.Namespace, name, serviceAccount.Name, "BuildRun", buildRun.Name) | ||
// add the secrets references into the new sa | ||
ApplyCredentials(ctx, build, serviceAccount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ApplyCredentials
does not do a save. You need to move this to before the client.Create
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! thanks, fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this deserves new integration tests., Im surprised this didnt break anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im surprised this didnt break anything
Me too. :-) I think integration and e2e test use an anonymous registry. Is not too difficult to change and we should do that probably soon. And private git registry tests in our todo list for next week. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extended the creates a new service-account and deletes it after the build is terminated
to check if an output secret exists on the autogenerated sa
866c253
to
2a2f435
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress. Two more remaining things from my site.
@@ -58,7 +58,7 @@ func add(ctx context.Context, mgr manager.Manager, r reconcile.Reconciler, maxCo | |||
|
|||
// The CreateFunc is also called when the controller is started and iterates over all objects. For those BuildRuns that have a TaskRun referenced already, | |||
// we do not need to do a further reconciliation. BuildRun updates then only happen from the TaskRun. | |||
return o.Status.LatestTaskRunRef == nil | |||
return o.Status.LatestTaskRunRef == nil && o.Status.CompletionTime == nil | |||
}, | |||
UpdateFunc: func(e event.UpdateEvent) bool { | |||
// Ignore updates to CR status in which case metadata.Generation does not change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would feel more comfortable if the update condition is
// Avoid reconciling when for updates on the BuildRun, the build.shipwright.io/name
// label is set, and when a BuildRun already have a referenced TaskRun.
if o.GetLabels()[buildv1alpha1.LabelBuild] == "" || o.Status.LatestTaskRunRef != nil || o.Status.CompletionTime != nil {
return false
}
I do not think we have any use case to reconcile a completed BuildRun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
_, obj, _ := client.CreateArgsForCall(0) | ||
serviceAccount, castSuccessful := obj.(*corev1.ServiceAccount) | ||
Expect(castSuccessful).To(BeTrue()) | ||
It("fails on a TaskRun creation due to unknown buildStrategy kind", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the kind is missing, the fallback to the namespaced strategy is back. I think the unit test "only" fails to create the TaskRun because the namespaced build strategy does not exist. That's complicated to understand, I think. Would maybe be better if the test case would mock the namespaced build strategy to exist and then explicitly test this fallback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two tiny things I would suggest to change.
16f1dd9
to
7d3d2b6
Compare
Ensure we have a customize error for Status Update failures Ensure we have an error struct that can support multiple errors Ensure we have a condition func to mark BuildRuns as Failed Ensure we have a func for BuildRun objects to understand if the BuildRun is failed or not. Signed-off-by: Matthias Diester <[email protected]>
Mainly done for two reasons. One for simplifying the call, second to stop using the controllerutil.CreateOrUpdate() func, which was not possible to test via the unit-test fakes. Signed-off-by: Matthias Diester <[email protected]>
This increases the coverage for the service account management in buildruns and provides support for most of all the scenarios we have there.
Simplify the createTaskRun func() Simplify the strategies retrieval and make it a standalone step prior to the TaskRun spec generation. Ensure the serviceAccount retrieval is a standalone step prior to the TaskRun spec generation. For both serviceAccount and strategies retrieval, fail the TaskRun if any of the custom checks take place or if a referenced object is not found, otherwise allow one more reconciliation on sys call failures. Signed-off-by: Matthias Diester <[email protected]>
On Create predicate events, do not reconcile if the CompletionTime is set for BuildRuns. When retrieving a Build object, reconcile again if is a sys call or mark the BuildRun as failed if the object was not found. When the Build object registration failed, mark the BuildRun as failed. When creating the TaskRun, allow to reconcile one more time on errors. Remove obsolete functions for updating BuildRun conditions. Add unit-tests for Build object retrieval.
This test the behaviour of the BuildRun on custom validations, asserting for a False Status in the Conditions or for an error after reconciliation, meaning we allow one more reconcile.
To reflect test coverage on BuildRun failures and to asser for the correct Condition fields inside the object.
Add new custom Reason valued when we have a Condition with a False Status.
When a Build do not specify an strategy Kind, ensure we default to a namespaced scope one. This logic was already in placed, a previous commit removed and this commit added it again in a different function. Also, it should fix shipwright-io#657
Introduce well defined test cases for: - BuildRun with no strategy kind defined should default to namespaced strategy and it should find the one that is in the system - BuildRun with no strategy kind defined should default to namespaced strategy and it should **fail** if there is no such strategy Signed-off-by: Matthias Diester <[email protected]>
Ignore a potential follow-up error by explicitly setting it to undef. Remove unwanted new-line. Co-authored-by: Matthias Diester <[email protected]>
I'll /approve and allow someone from CodeEngine to apply the lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gabemontero The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Changes
Fixes #558
This PR addresses the following:
FAILED
, update its Status.Conditions and to stop any further reconciliation (also known as one-shot approach). For errors that might appear as failures when doing system calls, we will allow another reconciliation.For the new custom checks that might lead to a failed BuildRun, we have the following states:
Before:
After:
Note: You can follow this PR by checking one commit at a time, it will simplify the review.
Submitter Checklist
See the contributor guide
for details on coding conventions, github and prow interactions, and the code review process.
Release Notes