Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskRuns may be orphaned by a PipelineRun #2558

Closed
afrittoli opened this issue May 6, 2020 · 3 comments · Fixed by #2573
Closed

TaskRuns may be orphaned by a PipelineRun #2558

afrittoli opened this issue May 6, 2020 · 3 comments · Fixed by #2573
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@afrittoli
Copy link
Member

Expected Behavior

The PipelineRun controller should always be aware of TaskRuns owned by a PipelineRun even when the PipelineRun status is stale (because of API issues).
Before using the PipelineRun status as a source of truth, we should check it against the list ot TaskRuns owned by the PipelineRun.

Actual Behavior

When TaskRuns are created during the reconcile cycle of a PipelineRun, they are added to the PipelineRun status which is then sync'ed back to etcd. The sync may fail whatever error condition - if that happens, the TaskRuns are partially "orphaned": the PipelineRun will be queued for reconcile when they change and they will be deleted when the PipelineRun is deleted, however the PipelineRun controller will not be aware that they exist at all, and it will create new duplicate TaskRuns.

This happens because ResolvePipelineRun relies on the PipelineRun status alone, via the getTaskRunName function.

Steps to Reproduce the Problem

  1. I would like to be able to reproduce this. It might be worth building a test case to reproduce this situation but it might be difficult. If we can disable specific API calls from functioning we could allow creating a taskrun while prevent updating a pipelinerun.

Additional Info

  • Tekton Pipeline version:

v0.12.0

@afrittoli afrittoli self-assigned this May 6, 2020
@afrittoli
Copy link
Member Author

/kind bug

@tekton-robot tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 6, 2020
@n3wscott
Copy link
Contributor

n3wscott commented May 6, 2020

my advice would be to not use the trackers for this but use a informer for TaskRuns inside of PipelineRuns. We do this all over Knative, here is an example: https://github.com/knative/eventing/blob/master/pkg/reconciler/parallel/controller.go#L62-L65

The Parallel resource makes Subscriptions and manages their lifecycles.

The issue you are going to run into is that the generate name on the TaskRun is random.

I think you should setup the watch on TaskRuns and propagate the real name of the TaskRun as they are generated and use that list to be able to reconcile their existence. You might want to link this up some how with the generation of the Pipeline or PipelineRun

@afrittoli
Copy link
Member Author

my advice would be to not use the trackers for this but use a informer for TaskRuns inside of PipelineRuns. We do this all over Knative, here is an example: https://github.com/knative/eventing/blob/master/pkg/reconciler/parallel/controller.go#L62-L65

Today we do both, not sure why:

So we do run the reconciler whenever a TaskRun is updated.

The Parallel resource makes Subscriptions and manages their lifecycles.

The issue you are going to run into is that the generate name on the TaskRun is random.

I think you should setup the watch on TaskRuns and propagate the real name of the TaskRun as they are generated and use that list to be able to reconcile their existence. You might want to link this up some how with the generation of the Pipeline or PipelineRun

When we generate the name for a TaskRun we store it in the PipelineRun status.
However if sync back to etcd fails during the reconcile cycle where the name was generated, we lose any track of that name. I've not seen happening in the wild tbh, but it seems to be a possibility - which is what this issue is about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants