-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Pipeline-level timeouts #355
Add Pipeline-level timeouts #355
Conversation
If someone with the appropriate privileges could label this as work in progress, I'd appreciate it. =) |
@abayer if you put |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments, didn't dig "too" deep as it's still wip 👼
/hold
@@ -17,12 +17,15 @@ limitations under the License. | |||
package v1alpha1 | |||
|
|||
import ( | |||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be separate import
"testing"
"time"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -238,15 +238,16 @@ func (c *Reconciler) reconcile(ctx context.Context, pr *v1alpha1.PipelineRun) er | |||
|
|||
if rprt != nil { | |||
c.Logger.Infof("Creating a new TaskRun object %s", rprt.TaskRunName) | |||
rprt.TaskRun, err = c.createTaskRun(c.Logger, rprt.ResolvedTaskResources.TaskName, rprt.TaskRunName, pr, rprt.PipelineTask, serviceAccount) | |||
rprt.TaskRun, err = c.createTaskRun(c.Logger, rprt.ResolvedTaskResources.TaskName, rprt.TaskRunName, pr, rprt.PipelineTask, serviceAccount, p.Spec.Timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we really need to refactor this createTaskRun
method 👼
(but it's not the PR's role to do it 👼 )
@@ -44,6 +42,8 @@ import ( | |||
coreinformers "k8s.io/client-go/informers/core/v1" | |||
"k8s.io/client-go/kubernetes" | |||
"k8s.io/client-go/tools/cache" | |||
"reflect" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before for the imports 👼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -97,6 +98,13 @@ func PipelineTask(name, taskName string, ops ...PipelineTaskOp) PipelineSpecOp { | |||
} | |||
} | |||
|
|||
// PipelineTimeout sets the timeout to the PipelineSpec. | |||
func PipelineTimeout(duration *metav1.Duration) PipelineSpecOp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why PipelineTimeout
takes a metav1.Duration
but PipelineRunStartTime
task a time.Time
— aka why not using time.Duration
on PipelineTimeout
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No actual good reason - just the hodgepodge of things I derived from. i.e., Build
's timeout is a metav1.Duration
, so I assumed I should do the same thing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR and awesome work. I have some minor feedback with code but here is my concern in terms of design
- If user configures timeout in pipelinerun, then all corresponding taskruns created should include parent object timeout
This PR is doing this (👍 ) but logic seems a bit complex
if pipelinerun creates a taskrun then taskrun timeout duration should be set to(pipelinerun start time + timeout duration) - current time
.
You could set this time toTaskspec.Timeout
and do not need new spec variable. Logic is already present in taskrun to check for this timeout. - This PR introduces additional timeout in taskrun spec and its confusing for a user who is building just taskrun yaml. Why would I use one over another? and why do we need 2 timeouts ?
@@ -106,6 +107,12 @@ type PipelineRunStatus struct { | |||
// In #107 should be updated to hold the location logs have been uploaded to | |||
// +optional | |||
Results *Results `json:"results,omitempty"` | |||
// StartTime is the time the build is actually started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: I think you meant "PipelineRun" here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
// StartTime is the time the build is actually started. | ||
// +optional | ||
StartTime *metav1.Time `json:"startTime,omitempty"` | ||
// CompletionTime is the time the build completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -51,6 +51,9 @@ type TaskRunSpec struct { | |||
TaskRef *TaskRef `json:"taskRef,omitempty"` | |||
//+optional | |||
TaskSpec *TaskSpec `json:"taskSpec,omitempty"` | |||
// PipelineRunTimeout is the time at which the entire PipelineRun times out. Defaults to nil. | |||
// +optional | |||
PipelineRunTimeout *metav1.Time `json:"pipelineRunTimeout,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not certain about the name of this variable to be "PipelineRunTimeout". TaskRun should not be referencing pipelinerun
is names. Why not timeout ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, nuked all that from orbit.
var after *duckv1alpha1.Condition | ||
before := tr.Status.GetCondition(duckv1alpha1.ConditionSucceeded) | ||
|
||
if !tr.Status.StartTime.IsZero() && tr.Spec.PipelineRunTimeout != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked the pattern in pipelinerun controller where InitializeConditions
function took care of initializing StartTime as well. I would recommend doing same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a timeout in the taskrun
from the TaskSpec. Currently there are two ways to get the task spec for the taskrun
, either directly from the taskSpec
field or from the Task
referenced from the taskRef
field (taskSpec and taskRef are mutually exclusive). So I don't think its necessary to define another field on the task level
The pipeline timeout should also override the task(s) timeouts, but take into account the actual time used in a task, for example if task one has a timeout of 5 seconds but actually is completed in 2 seconds, these 3 seconds should be added to the following task while respecting the pipeline timeout
I'm going to pick this back up next week - hopefully #381 gets merged soon so I can redo some aspects of this on top of it. =) |
ab4a51c
to
5acbad6
Compare
#381 is merged, I've rebased this, and tomorrow I'll be working on responding to feedback. =) |
...and yes, the tests currently hang. Will fix tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really coming along!! I left a smattering of feedback, nothing too serious
Main thoughts:
- How does it work after a PipelineRun times out, how do we know any TaskRuns it created will also time out?
- Could you add some docs as well (maybe in using.md?) on how this will work
❤️
Generation int64 `json:"generation,omitempty"` | ||
Tasks []PipelineTask `json:"tasks"` | ||
// Time after which the build times out. Defaults to never. | ||
// Specified build timeout should be less than 24h. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a specific reason why less than 24h? (e.g. something we can link to) otherwise i wonder if we really should set an upper bound at all 🤔 people might want to do some crazy things
(ps. maybe update the comment here to say "pipeline" instead of "build"?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question - I just assumed we'd want a max, but I'll yank it out for now.
|
||
if ps.Timeout != nil { | ||
// timeout should be a valid duration of between 0 and 24 hours. | ||
maxTimeout := time.Duration(24 * time.Hour) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think declaring this at the top of the file as a var
is consistent with how we declare other values like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could even be a constant 👼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've yanked this out anyway. =)
} else if ts.PipelineRunTimeout.Time.Before(time.Now()) { | ||
return apis.ErrInvalidValue(fmt.Sprintf("Time from now until timeout %s should be >= 0", ts.PipelineRunTimeout.Sub(time.Now()).Truncate(time.Second).String()), "spec.pipelineruntimeout") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this logic be shared with the pipeline logic? e.g. moved into a module they can both call (i think it's the same)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So after removing PipelineRunTimeout
, this goes away too, so hey. =)
// If a timeout is specified for the PipelineRun, set the timeout for the task to be the time from now until the | ||
// PipelineRun timeout would occur. | ||
return &metav1.Time{Time: pr.Status.StartTime.Add(pipelineTimeout.Duration)} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like that this function doesn't depend on the reconciler at all! and b/c of that, we could easily move it into a separate module, and even have a unit test or two for it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May even be attached to PipelineRun
struct 👼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aaaaaand I've gotten rid of this and just moved the equivalent logic within createTaskRun
, but I can extract it again if you'd like?
tb.PipelineRunServiceAccount("test-sa"), | ||
), | ||
tb.PipelineRunStatus( | ||
tb.PipelineRunStartTime(time.Now().AddDate(0, 0, -1))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice :D
if !startTime.IsZero() && pipelineTimeout != nil { | ||
timeout := pipelineTimeout.Duration | ||
runtime := time.Since(startTime.Time) | ||
if runtime > timeout { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic to determine if a timeout has occurred seems to me like a good candidate for moving into a separate function with some tests (e.g. hasTimedOut
)
Status: corev1.ConditionFalse, | ||
Reason: ReasonTimedOut, | ||
Message: timeoutMsg, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after this happens, does anything need to be done to make sure the taskruns that the pipelinerun spawned will also time out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum yeah I think we need to cancel the TaskRun
s there 👼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TaskRun
s should themselves time out now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abayer oh right, TaskRun
s will also have the timeout field, and thus should time out on their own in the same time more or less, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I don't see the code for that though 🤔 Am I missing something ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vdemeester See the changes tocreateTaskRun
in https://github.com/knative/build-pipeline/pull/355/files#diff-225ed18b17b573cba99d592dea1da1a0R308
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abayer nevermind, I was looking in the diff for support for Timeout
on Task.… but it's already supported (because of build) 😅
if err != nil { | ||
c.Recorder.Eventf(pr, corev1.EventTypeWarning, "TaskRunCreationFailed", "Failed to create TaskRun %q: %v", rprt.TaskRunName, err) | ||
return fmt.Errorf("error creating TaskRun called %s for PipelineTask %s from PipelineRun %s: %s", rprt.TaskRunName, rprt.PipelineTask.Name, pr.Name, err) | ||
} | ||
} | ||
|
||
before := pr.Status.GetCondition(duckv1alpha1.ConditionSucceeded) | ||
after := resources.GetPipelineConditionStatus(pr.Name, pipelineState, c.Logger) | ||
after := resources.GetPipelineConditionStatus(pr.Name, pipelineState, c.Logger, pr.Status.StartTime, p.Spec.Timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have this sneaking suspicion like it'd be a bit cleaner if we checked for the timeout after this call, instead of inside it, since it seems like the logic needs different inputs (i.e. start time and timeout)
but i dont feel too strongly, also seems weird to determine the condition should be X and then decide afterward it should actually be Y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this needs more thought.
before := tr.Status.GetCondition(duckv1alpha1.ConditionSucceeded) | ||
|
||
if !tr.Status.StartTime.IsZero() && tr.Spec.PipelineRunTimeout != nil { | ||
if time.Now().After(tr.Spec.PipelineRunTimeout.Time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could both controller share the timeout logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been yanked out here and is just using Build
's timeout.
|
||
before := tr.Status.GetCondition(duckv1alpha1.ConditionSucceeded) | ||
// TODO: Skipping all things pod/build-related if we've timed out, but this probably should be better. | ||
if after == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i feel like after == nil
is a bit hard to understand, i wonder if there's some way this could be something like:
if hasTimedOut() {
// do timeout stuff
} else {
// do all this pod stuff
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or even like
if hasTimedOut() {
// set some status stuff
// return
}
do pod stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this has all been yanked. =)
@pivotal-nader-ziada @shashwathi - Ah, I hadn't realized it was possible to use |
a632f7a
to
b945146
Compare
/hold |
...ok, I have no idea how to get rid of the hold label. =) |
/hold cancel |
Ah-ha! Thanks, @shashwathi! |
spec: | ||
timeout: 5m | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 nice!
@@ -528,3 +542,37 @@ func makeLabels(s *v1alpha1.TaskRun) map[string]string { | |||
func isDone(status *v1alpha1.TaskRunStatus) bool { | |||
return !status.GetCondition(duckv1alpha1.ConditionSucceeded).IsUnknown() | |||
} | |||
|
|||
func (c *Reconciler) checkTimeout(tr *v1alpha1.TaskRun, ts *v1alpha1.TaskSpec) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you pass the Delete
function into here, you can move this function into a separate package and have unit tests for it without involving the reconciler :D (for example see ResolveTaskResources)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, mostly done. Finishing now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take that back - I'm not sure where to move this function to. =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aw well, we can move it later :)
i would have maybe made a new package in https://github.com/knative/build-pipeline/tree/master/pkg/reconciler/v1alpha1/taskrun called timeout
or something
9b23a4b
to
7a87112
Compare
/test pull-knative-build-pipeline-integration-tests |
08c6aaf
to
ce606dd
Compare
Ok, fixed |
The following is the coverage report on pkg/.
|
Waiting on tektoncd/pipeline#355 to be merged to add top-level timeout.
@pivotal-nader-ziada @shashwathi @bobcatfish @tejal29 @vdemeester @imjasonh - sorry to ping you all directly, but I just wanted to see if there's anything else I need to work on here or if this is good to go? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code wise LGTM 👍
Related to knative/build#543 and slack discussion on handling timeout "out of resync", I wonder if we should go with that for the time being and refactor it once we find a better solution.
cc @imjasonh
@vdemeester Yeah, that’s my thinking - the bulk of this won’t change in that scenario, just the encapsulated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve 🎉 |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abayer, ImJasonH, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Waiting on tektoncd/pipeline#355 to be merged to add top-level timeout.
Waiting on tektoncd/pipeline#355 to be merged to add top-level timeout.
Waiting on tektoncd/pipeline#355 to be merged to add top-level timeout.
Waiting on tektoncd/pipeline#355 to be merged to add top-level timeout.
Waiting on tektoncd/pipeline#355 to be merged to add top-level timeout.
We were passing the original request where the body had been drained to the first interceptor meaning that the body to the first interceptor in the chain was always empty. Further, we were using the `GetBody` which can sometimes be empty leading to a panic. fixes tektoncd#355 Signed-off-by: Dibyo Mukherjee <[email protected]>
This allows configuring
Pipeline
timeouts that will be enforced for theirTaskRun
s appropriately as well, and also replicatesBuild
's timeout behavior onTaskRun
, since that didn't actually get moved over here earlier.Fixes #222