Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cancel_test doesn't tolerate redundant events #3366

Closed
mattmoor opened this issue Oct 11, 2020 · 8 comments
Closed

cancel_test doesn't tolerate redundant events #3366

mattmoor opened this issue Oct 11, 2020 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flakey test lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@mattmoor
Copy link
Member

Expected Behavior

Consistently passing test 😉

Actual Behavior

Intermittent flake with:

        cancel_test.go:150: Making sure 2 events were created from pipelinerun with kinds map[PipelineRun:[cancel-me] TaskRun:[cancel-me-task-n9z44]]
        cancel_test.go:163: Expected 2 number of successful events from pipelinerun and taskrun but got 3; list of received events : "&v1.Event{TypeMeta:v1.TypeMeta{Kind:\"\", APIVersion:\"\"}, ObjectMeta:v1.ObjectMeta{Name:\"cancel-me-task-n9z44.163cf95e6d3a99c6\", GenerateName:\"\", Namespace:\"arendelle-46g2s\", SelfLink:\"/api/v1/namespaces/arendelle-46g2s/events/cancel-me-task-n9z44.163cf95e6d3a99c6\", UID:\"378a6231-6314-4737-b2d5-f79b65e97a9f\", ResourceVersion:\"4665\", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63738026551, loc:(*time.Location)(0x4051540)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:\"\", ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:\"webhook\", Operation:\"Update\", APIVersion:\"v1\", Time:(*v1.Time)(0xc000a2ac00), FieldsType:\"FieldsV1\", FieldsV1:(*v1.FieldsV1)(0xc000a2ac20)}}}, InvolvedObject:v1.ObjectReference{Kind:\"TaskRun\", Namespace:\"arendelle-46g2s\", Name:\"cancel-me-task-n9z44\", UID:\"6e90c588-0259-4c58-8014-fab9770f60cd\", APIVersion:\"tekton.dev/v1beta1\", ResourceVersion:\"4657\", FieldPath:\"\"}, Reason:\"Failed\", Message:\"TaskRun \\\"cancel-me-task-n9z44\\\" was cancelled\", Source:v1.EventSource{Component:\"TaskRun\", Host:\"\"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63738026551, loc:(*time.Location)(0x4051540)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63738026551, loc:(*time.Location)(0x4051540)}}, Count:1, Type:\"Warning\", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:\"\", Related:(*v1.ObjectReference)(nil), ReportingController:\"\", ReportingInstance:\"\"}, &v1.Event{TypeMeta:v1.TypeMeta{Kind:\"\", APIVersion:\"\"}, ObjectMeta:v1.ObjectMeta{Name:\"cancel-me.163cf95e6a3e3d18\", GenerateName:\"\", Namespace:\"arendelle-46g2s\", SelfLink:\"/api/v1/namespaces/arendelle-46g2s/events/cancel-me.163cf95e6a3e3d18\", UID:\"eaa8ecae-1b5f-4bb6-8043-8b5c7ad18c90\", ResourceVersion:\"4664\", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63738026551, loc:(*time.Location)(0x4051540)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:\"\", ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:\"webhook\", Operation:\"Update\", APIVersion:\"v1\", Time:(*v1.Time)(0xc000a2ae80), FieldsType:\"FieldsV1\", FieldsV1:(*v1.FieldsV1)(0xc000a2aea0)}}}, InvolvedObject:v1.ObjectReference{Kind:\"PipelineRun\", Namespace:\"arendelle-46g2s\", Name:\"cancel-me\", UID:\"5a802a02-28ec-471b-bd83-ecc1168e17d4\", APIVersion:\"tekton.dev/v1beta1\", ResourceVersion:\"4655\", FieldPath:\"\"}, Reason:\"Failed\", Message:\"PipelineRun \\\"cancel-me\\\" was cancelled\", Source:v1.EventSource{Component:\"PipelineRun\", Host:\"\"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63738026551, loc:(*time.Location)(0x4051540)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63738026551, loc:(*time.Location)(0x4051540)}}, Count:2, Type:\"Warning\", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:\"\", Related:(*v1.ObjectReference)(nil), ReportingController:\"\", ReportingInstance:\"\"}, &v1.Event{TypeMeta:v1.TypeMeta{Kind:\"\", APIVersion:\"\"}, ObjectMeta:v1.ObjectMeta{Name:\"cancel-me-task-n9z44.163cf95e6d3a99c6\", GenerateName:\"\", Namespace:\"arendelle-46g2s\", SelfLink:\"/api/v1/namespaces/arendelle-46g2s/events/cancel-me-task-n9z44.163cf95e6d3a99c6\", UID:\"378a6231-6314-4737-b2d5-f79b65e97a9f\", ResourceVersion:\"4669\", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63738026551, loc:(*time.Location)(0x4051540)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:\"\", ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:\"webhook\", Operation:\"Update\", APIVersion:\"v1\", Time:(*v1.Time)(0xc000a2af20), FieldsType:\"FieldsV1\", FieldsV1:(*v1.FieldsV1)(0xc000a2af40)}}}, InvolvedObject:v1.ObjectReference{Kind:\"TaskRun\", Namespace:\"arendelle-46g2s\", Name:\"cancel-me-task-n9z44\", UID:\"6e90c588-0259-4c58-8014-fab9770f60cd\", APIVersion:\"tekton.dev/v1beta1\", ResourceVersion:\"4657\", FieldPath:\"\"}, Reason:\"Failed\", Message:\"TaskRun \\\"cancel-me-task-n9z44\\\" was cancelled\", Source:v1.EventSource{Component:\"TaskRun\", Host:\"\"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63738026551, loc:(*time.Location)(0x4051540)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63738026551, loc:(*time.Location)(0x4051540)}}, Count:2, Type:\"Warning\", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:\"\", Related:(*v1.ObjectReference)(nil), ReportingController:\"\", ReportingInstance:\"\"}"

There are a number of reasons that this could be emitting multiple events of the same kind, including possibly stale informer caches during the Reconcile that emits the stale event.

I'd think the test should probably check "at least once" for each of the two events instead? cc @afrittoli

Steps to Reproduce the Problem

It's intermittent, so run it a lot 🤷

Additional Info

I'm running e2e on KinD on Github Actions, with Tekton at ~HEAD.

@mattmoor mattmoor added the kind/bug Categorizes issue or PR as related to a bug. label Oct 11, 2020
@afrittoli afrittoli added the kind/flake Categorizes issue or PR as related to a flakey test label Oct 12, 2020
@afrittoli
Copy link
Member

afrittoli commented Oct 12, 2020

Yeah, the TaskRun event is sent twice, the second time shows with Count:2.
The resource version of the two events is different between the two events, the resource version of the TaskRun is the same though. I wonder why this happens only with the cancel_test though, and why it started failing all of sudden...

@afrittoli
Copy link
Member

Other example of failures from #3374

#3369
#3353
#3242

@mattmoor
Copy link
Member Author

@afrittoli Not sure from your comment what you think the remediation should be?

@afrittoli
Copy link
Member

afrittoli commented Oct 14, 2020

For now I changed the specific test to allow for duplicate events, and I filed another issue about investigating this further #3375

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 12, 2021
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 11, 2021
@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flakey test lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

3 participants