Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the way resources are injected into the reconcilers #3305

Open
pierretasci opened this issue Sep 30, 2020 · 15 comments
Open

Refactor the way resources are injected into the reconcilers #3305

pierretasci opened this issue Sep 30, 2020 · 15 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@pierretasci
Copy link

pierretasci commented Sep 30, 2020

Feature request

The current assumption in Tekton is that most of the things it needs to look up to be able to reconcile a TaskRun or PipelineRun exist within the cluster. Until now, this was a fair assumption and it makes the reconcile loop rather straight-forward.

Going forward, with the introduction of Tekton Bundles and soon to be other forms of external dependencies, there is a missing concept of the Tekton controllers which is that of "pre-work". That is, reconciliation should not block but reaching out to external services like an image registry is expensive and error prone. For future expansion, we should add another step in the process between when the object is created and when it start running to fetch and prepare and "prework". This will likely require a redesign of how the controllers reconcile.

@pierretasci pierretasci added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 30, 2020
@vdemeester vdemeester added this to the Pipelines v0.18 milestone Oct 8, 2020
@vdemeester
Copy link
Member

/assign

@pritidesai
Copy link
Member

pritidesai commented Nov 4, 2020

Hey @vdemeester I am moving this out of 0.18 (to 0.19) as per our discussion in the General WG, "moving bundle behind a feature flag give that its still in alpha"

(see PR #3492 for more discussion)

@ghost
Copy link

ghost commented Jan 25, 2021

Pushing this one back to 0.22 since it sounds like there are a number of connected pieces that need to be pulled together to get this into good shape.

@dibyom dibyom modified the milestones: Pipelines 0.22, Pipelines 0.23 Mar 9, 2021
@ghost
Copy link

ghost commented Mar 23, 2021

Pushing this one back again to 0.24.

@ghost
Copy link

ghost commented May 18, 2021

We could spend some effort taking some measurements in our dogfooding clusters to see how a change here could have an impact before we commit to refactoring?

@pierretasci do you have an idea of how you'd like to proceed on this one? Is it still worth keeping in the milestones?

@pierretasci
Copy link
Author

I think this is going to need to happen for remote resource resolution anyways, so perhaps we refocus this on what is needed to get that to work

@ghost
Copy link

ghost commented May 21, 2021

Makes sense, I will remove this from milestones for now on the basis that remote resolution is still in the design stages.

@ghost ghost removed this from the Pipelines v0.25 milestone May 21, 2021
@ghost
Copy link

ghost commented Jul 16, 2021

/assign

I've started working on this in a branch. The way I've structured it at the moment, I've introduced a new reconciler whose sole responsibility is resolving. Once the specs have been resolved into the status field of the taskrun / pipelinerun then the "original" reconcilers take over again and actuate as normal.

@tekton-robot tekton-robot assigned ghost Jul 16, 2021
@ghost
Copy link

ghost commented Jul 19, 2021

One interesting wrinkle during this refactor: our resolution logic is pretty directly tied to the way we update annotations and labels on taskruns/pipelineruns. As soon as we've resolved the task or pipeline we copy the labels and annotations over to the run. So at the moment the "Resolver Reconciler" I've introduced is responsible not only for resolving resources but also copying those annotations/labels over too.

At some point we'll probably want to change this. I think ideally the resolution layer shouldn't also be responsible for deciding which labels and annotations are copied to runs.

@mattmoor
Copy link
Member

mattmoor commented Sep 5, 2021

That is, reconciliation should not block but reaching out to external services like an image registry is expensive and error prone

I thought I'd chime in here, since I noticed an experiment that's got multiple reconcilers parting on *Run status, which is kind of an anti-pattern.

So we have a number of places we've dealt with problems very similar to this in Knative with a fair degree of success. Two that are top-of-mind for me are:

  • probing the dataplane for kingress readiness,
  • tag to digest resolution.

Both of these expose idempotent interfaces to the core reconciler, uses a workqueue for, and has a callback that enqueues the original resource when the job is done. This pattern has served us very well, and I'd definitely recommend it as a way of offloading work from the main reconciler.


That said, I think the main thing I'd change with these is to actually use a child resource as the API, and leverage a proper reconciler to manage the workqueuing (and OwnerRefs to handle queuing the parent on completion).

It is notable that this is effectively the direction that y'all are heading, except instead of reconciling a child resource you are splitting the reconciliation of a single resource across two reconcilers (which I would strongly discourage).

I'd be happy to jump on a call to discuss this more.

@ghost
Copy link

ghost commented Sep 13, 2021

This sounds like the Tekton Resource Request CRD alternative captured in the resolution TEP. That proposal introduces a new resource type (child resource?) and reconciler as you describe. Good to know this is a preferred approach.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 12, 2021
@ghost
Copy link

ghost commented Dec 13, 2021

/lifecycle frozen

@tekton-robot tekton-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 13, 2021
@ghost
Copy link

ghost commented Dec 13, 2021

/remove-lifecycle stale

justification: this may still happen if TEP-0060 gets approved and implemented.

@ghost
Copy link

ghost commented Mar 8, 2022

In #4596 we're adding support for pipelineRefs via remote resolution. This makes the resolution bit non-blocking to a PipelineRun's reconcile. However the code still maintains the PipelineRun's existing resolution mechanics alongside the new stuff and could still use a refactor, perhaps if/when remote resolution moves out of alpha. One option would be for every impl, including Tekton's built-in, to be moved behind the remote.Resolver interface that bundles introduced and Pipeline's reconcilers could just speak to that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

6 participants