Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit events for all TaskRun lifecycle events #2329

Merged
merged 1 commit into from
May 1, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions docs/events.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
<!--
---
linkTitle: "Events"
weight: 2
---
-->
# Events
afrittoli marked this conversation as resolved.
Show resolved Hide resolved

Tekton runtime resources, specifically `TaskRuns` and `PipelineRuns`,
emit events when they are executed, so that users can monitor their lifecycle
and react to it. Tekton emits [kubernetes events](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#event-v1-core), that can be retrieve from the resource via
`kubectl describe [resource]`.

No events are emitted for `Conditions` today (https://github.com/tektoncd/pipeline/issues/2461).

## TaskRuns

`TaskRun` events are generated for the following `Reasons`:
- `Started`: this is triggered the first time the `TaskRun` is picked by the
reconciler from its work queue, so it only happens if web-hook validation was
successful. Note that this event does not imply that a step started executing,
as several conditions must be met first:
- task and bound resource validation must be successful
- attached conditions must run successfully
- the `Pod` associated to the `TaskRun` must be successfully scheduled
afrittoli marked this conversation as resolved.
Show resolved Hide resolved
- `Succeeded`: this is triggered once all steps in the `TaskRun` are executed
successfully, including post-steps injected by Tekton.
- `Failed`: this is triggered if the `TaskRun` is completed, but not successfully.
Causes of failure may be: one the steps failed, the `TaskRun` was cancelled or
the `TaskRun` timed out.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if it was skipped? (maybe to be added?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A TaskRun is something that may only happen when a Pipeline is involved, so I did not look into that on this patch, where I focuses on events generated by the TaskRun controller.
There are two kind of skips:

  • skip because of a condition not met - which is easy, and we definitely can emit an event for that. I opened an issue which is related [feature] Conditions emit no events like other Tekton resources #2461
  • skip because it was not reached in the DAG: we don't do anything today we tasks that were not reached in the DAG because of previous failures, but it should be possible to process them as a post-run step in the pipelinerun reconciler

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, there's no taskrun at all, makes sense. those 2 skips are totally valid also. like you said, conditions aren't emitting events anyway, maybe we can revisit this then.

thanks!!


## PipelineRuns

`PipelineRun` events are generated for the following `Reasons`:
- `Succeeded`: this is triggered once all `Tasks` reachable via the DAG are
executed successfully.
- `Failed`: this is triggered if the `PipelineRun` is completed, but not
successfully. Causes of failure may be: one the `Tasks` failed or the
`PipelineRun` was cancelled.
2 changes: 2 additions & 0 deletions docs/pipelineruns.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ weight: 4
- [Specifying `LimitRange` values](#specifying-limitrange-values)
- [Configuring a failure timeout](#configuring-a-failure-timeout)
- [Cancelling a `PipelineRun`](#cancelling-a-pipelinerun)
- [Events](events.md#pipelineruns)



## Overview
Expand Down
1 change: 1 addition & 0 deletions docs/taskruns.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ weight: 2
- [Monitoring `Steps`](#monitoring-steps)
- [Monitoring `Results`](#monitoring-results)
- [Cancelling a `TaskRun`](#cancelling-a-taskrun)
- [Events](events.md#taskruns)
- [Code examples](#code-examples)
- [Example `TaskRun` with a referenced `Task`](#example-taskrun-with-a-referenced-task)
- [Example `TaskRun` with an embedded `Task`](#example-taskrun-with-an-embedded-task)
Expand Down
31 changes: 25 additions & 6 deletions pkg/reconciler/event.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ package reconciler

import (
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/equality"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/client-go/tools/record"
"knative.dev/pkg/apis"
Expand All @@ -27,20 +28,38 @@ const (
EventReasonSucceded = "Succeeded"
// EventReasonFailed is the reason set for events about unsuccessful completion of TaskRuns / PipelineRuns
EventReasonFailed = "Failed"
// EventReasonStarted is the reason set for events about the start of TaskRuns / PipelineRuns
EventReasonStarted = "Started"
)

// EmitEvent emits success or failed event for object
// if afterCondition is different from beforeCondition
// EmitEvent emits an event for object if afterCondition is different from beforeCondition
//
// Status "ConditionUnknown":
// beforeCondition == nil, emit EventReasonStarted
// beforeCondition != nil, emit afterCondition.Reason
//
// Status "ConditionTrue": emit EventReasonSucceded
// Status "ConditionFalse": emit EventReasonFailed
//
func EmitEvent(c record.EventRecorder, beforeCondition *apis.Condition, afterCondition *apis.Condition, object runtime.Object) {
if beforeCondition != afterCondition && afterCondition != nil {
// Create events when the obj result is in.
if !equality.Semantic.DeepEqual(beforeCondition, afterCondition) && afterCondition != nil {
// If the condition changed, and the target condition is not empty, we send an event
switch afterCondition.Status {
case corev1.ConditionTrue:
c.Event(object, corev1.EventTypeNormal, EventReasonSucceded, afterCondition.Message)
case corev1.ConditionUnknown:
c.Event(object, corev1.EventTypeNormal, afterCondition.Reason, afterCondition.Message)
case corev1.ConditionFalse:
c.Event(object, corev1.EventTypeWarning, EventReasonFailed, afterCondition.Message)
case corev1.ConditionUnknown:
if beforeCondition == nil {
// If the condition changed, the status is "unknown", and there was no condition before,
// we emit the "Started event". We ignore further updates of the "unknown" status.
c.Event(object, corev1.EventTypeNormal, EventReasonStarted, "")
} else {
// If the condition changed, the status is "unknown", and there was a condition before,
// we emit an event that matches the reason and message of the condition.
// This is used for instance to signal the transition from "started" to "running"
c.Event(object, corev1.EventTypeNormal, afterCondition.Reason, afterCondition.Message)
}
}
}
}
107 changes: 87 additions & 20 deletions pkg/reconciler/event_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,38 +21,45 @@ import (
"time"

corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/tools/record"
"knative.dev/pkg/apis"
)

func TestEmitEvent(t *testing.T) {
testcases := []struct {
name string
before *apis.Condition
after *apis.Condition
expectEvent bool
name string
before *apis.Condition
after *apis.Condition
expectEvent bool
expectedEvent string
}{{
name: "unknown to true",
name: "unknown to true with message",
before: &apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionUnknown,
},
after: &apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionTrue,
Type: apis.ConditionSucceeded,
Status: corev1.ConditionTrue,
Message: "all done",
},
expectEvent: true,
expectEvent: true,
expectedEvent: "Normal Succeeded all done",
}, {
name: "true to true",
before: &apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionTrue,
Type: apis.ConditionSucceeded,
Status: corev1.ConditionTrue,
LastTransitionTime: apis.VolatileTime{Inner: metav1.NewTime(time.Now())},
},
after: &apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionTrue,
Type: apis.ConditionSucceeded,
Status: corev1.ConditionTrue,
LastTransitionTime: apis.VolatileTime{Inner: metav1.NewTime(time.Now().Add(5 * time.Minute))},
},
expectEvent: false,
expectEvent: false,
expectedEvent: "",
}, {
name: "false to false",
before: &apis.Condition{
Expand All @@ -63,23 +70,74 @@ func TestEmitEvent(t *testing.T) {
Type: apis.ConditionSucceeded,
Status: corev1.ConditionFalse,
},
expectEvent: false,
expectEvent: false,
expectedEvent: "",
}, {
name: "unknown to unknown",
before: &apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionUnknown,
Reason: "",
Message: "",
},
after: &apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionUnknown,
Reason: "foo",
Message: "bar",
},
expectEvent: true,
expectedEvent: "Normal foo bar",
}, {
name: "true to nil",
after: nil,
before: &apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionTrue,
},
expectEvent: true,
expectEvent: false,
expectedEvent: "",
}, {
name: "nil to true",
before: nil,
after: &apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionTrue,
},
expectEvent: true,
expectEvent: true,
expectedEvent: "Normal Succeeded ",
}, {
name: "nil to unknown with message",
before: nil,
after: &apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionUnknown,
Message: "just starting",
},
expectEvent: true,
expectedEvent: "Normal Started ",
}, {
name: "unknown to false with message",
before: &apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionUnknown,
},
after: &apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionFalse,
Message: "really bad",
},
expectEvent: true,
expectedEvent: "Warning Failed really bad",
}, {
name: "nil to false",
before: nil,
after: &apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionFalse,
},
expectEvent: true,
expectedEvent: "Warning Failed ",
}}

for _, ts := range testcases {
Expand All @@ -90,12 +148,21 @@ func TestEmitEvent(t *testing.T) {

select {
case event := <-fr.Events:
if ts.expectEvent && event == "" {
t.Errorf("Expected event but got empty for %s", ts.name)
if event == "" {
// The fake recorder reported empty, it should not happen
t.Fatalf("Expected event but got empty for %s", ts.name)
}
case <-timer.C:
if !ts.expectEvent {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a bit confused by the cases that don't require an event, do you think it might make sense to have an event every time the condition changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only cases where we do not emit an event are:

  • there was no change in condition (except for LastTransitionTime)
  • afterCondition == nil

For the former I think we do not want an event, it may be we wrote the condition again and updated LastTransitionTime, but nothing really happened to the Condition, so no event.
For the latter, I could not think of a use case where afterCondition is nil, so I saw no point in adding that case now.

t.Errorf("Unexpected event but got for %s", ts.name)
// The fake recorder reported an event which we did not expect
t.Errorf("Unxpected event \"%s\" but got one for %s", event, ts.name)
}
if !(event == ts.expectedEvent) {
t.Errorf("Expected event \"%s\" but got \"%s\" instead for %s", ts.expectedEvent, event, ts.name)
}
case <-timer.C:
if ts.expectEvent {
// The fake recorder did not report, the timer timeout expired
t.Errorf("Expected event but got none for %s", ts.name)
}
}
}
Expand Down
2 changes: 2 additions & 0 deletions pkg/reconciler/taskrun/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ const (
resyncPeriod = 10 * time.Hour
)

// NewController instantiates a new controller.Impl from knative.dev/pkg/controller
afrittoli marked this conversation as resolved.
Show resolved Hide resolved
func NewController(images pipeline.Images) func(context.Context, configmap.Watcher) *controller.Impl {
return func(ctx context.Context, cmw configmap.Watcher) *controller.Impl {
logger := logging.FromContext(ctx)
Expand All @@ -66,6 +67,7 @@ func NewController(images pipeline.Images) func(context.Context, configmap.Watch
ConfigMapWatcher: cmw,
ResyncPeriod: resyncPeriod,
Logger: logger,
Recorder: controller.GetEventRecorder(ctx),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just for testing? that's a bit unfortunate if so :( probably worth it tho?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, injecting fakes in the context is the same mechanism that we use to inject fake k8s client-go and other fakes.

}

entrypointCache, err := pod.NewEntrypointCache(kubeclientset)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,7 @@ func cloudEventDeliveryFromTargets(targets []string) []v1alpha1.CloudEventDelive
}

// SendCloudEvents is used by the TaskRun controller to send cloud events once
// the TaskRun is complete. `tr` is used to obtain the list of targets but also
// to construct the body of the
afrittoli marked this conversation as resolved.
Show resolved Hide resolved
// the TaskRun is complete. `tr` is used to obtain the list of targets
func SendCloudEvents(tr *v1alpha1.TaskRun, ceclient CEClient, logger *zap.SugaredLogger) error {
logger = logger.With(zap.String("taskrun", tr.Name))

Expand Down
20 changes: 13 additions & 7 deletions pkg/reconciler/taskrun/taskrun.go
Original file line number Diff line number Diff line change
Expand Up @@ -103,11 +103,16 @@ func (c *Reconciler) Reconcile(ctx context.Context, key string) error {

// If the TaskRun is just starting, this will also set the starttime,
// from which the timeout will immediately begin counting down.
tr.Status.InitializeConditions()
// In case node time was not synchronized, when controller has been scheduled to other nodes.
if tr.Status.StartTime.Sub(tr.CreationTimestamp.Time) < 0 {
c.Logger.Warnf("TaskRun %s createTimestamp %s is after the taskRun started %s", tr.GetRunKey(), tr.CreationTimestamp, tr.Status.StartTime)
tr.Status.StartTime = &tr.CreationTimestamp
if !tr.HasStarted() {
tr.Status.InitializeConditions()
// In case node time was not synchronized, when controller has been scheduled to other nodes.
bobcatfish marked this conversation as resolved.
Show resolved Hide resolved
if tr.Status.StartTime.Sub(tr.CreationTimestamp.Time) < 0 {
c.Logger.Warnf("TaskRun %s createTimestamp %s is after the taskRun started %s", tr.GetRunKey(), tr.CreationTimestamp, tr.Status.StartTime)
tr.Status.StartTime = &tr.CreationTimestamp
}
// Emit events
afterCondition := tr.Status.GetCondition(apis.ConditionSucceeded)
reconciler.EmitEvent(c.Recorder, nil, afterCondition, tr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a bit confused why this call to EmitEvent is needed in addition to the other one a bunch of lines down in the same function

is it possible we can get away with one? i do like the idea of only having to call this logic once instead of being aware of 2 places it is called

or if it is required maybe a comment explaining why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to emit the event every time the condition changes, because - as things are today - you could have more than one condition change in one reconcile cycle.

That could change if we added extra conditions in future, i.e. instead of only using Succeeded (unknown/true/false) we could use Started and Running too. In any case, I think it's worth sending the event as early as possible. This reconcile cycle will go on to create a pod and run several API calls, so we may be losing a few hundreds milliseconds if we wait.

If something is looking for the event to update the status of a check in github, half a second later makes a difference.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could have more than one condition change in one reconcile cycle.

sounds like that's only in the started case? going from started to running - and shouldn't be expected from any other states?

anyway makes sense! i do think this is worth a comment explaining why the 2 calls but also wouldnt block on this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's only in that case. I promise I'll add a comment in a follow-up 👼

}

// If the TaskRun is complete, run some post run fixtures when applicable
Expand Down Expand Up @@ -200,7 +205,6 @@ func (c *Reconciler) Reconcile(ctx context.Context, key string) error {
if err = c.reconcile(ctx, tr, taskSpec, rtr); err != nil {
c.Logger.Errorf("Reconcile error: %v", err.Error())
}

// Emit events (only when ConditionSucceeded was changed)
after := tr.Status.GetCondition(apis.ConditionSucceeded)
reconciler.EmitEvent(c.Recorder, before, after, tr)
Expand Down Expand Up @@ -337,6 +341,9 @@ func (c *Reconciler) reconcile(ctx context.Context, tr *v1alpha1.TaskRun,
if k8serrors.IsNotFound(err) {
// Keep going, this will result in the Pod being created below.
} else if err != nil {
// This is considered a transient error, so we return error, do not update
// the task run condition, and return an error which will cause this key to
// be requeued for reconcile.
afrittoli marked this conversation as resolved.
Show resolved Hide resolved
c.Logger.Errorf("Error getting pod %q: %v", tr.Status.PodName, err)
return err
}
Expand Down Expand Up @@ -401,7 +408,6 @@ func (c *Reconciler) reconcile(ctx context.Context, tr *v1alpha1.TaskRun,
}

c.Logger.Infof("Successfully reconciled taskrun %s/%s with status: %#v", tr.Name, tr.Namespace, tr.Status.GetCondition(apis.ConditionSucceeded))

return nil
}

Expand Down
Loading