Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PipelineRun Status bloat and excessive updates #3140

Closed
skaegi opened this issue Aug 26, 2020 · 15 comments
Closed

PipelineRun Status bloat and excessive updates #3140

skaegi opened this issue Aug 26, 2020 · 15 comments
Assignees
Labels
area/performance Issues or PRs that are related to performance aspects. kind/design Categorizes issue or PR as related to design. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@skaegi
Copy link
Contributor

skaegi commented Aug 26, 2020

In our production systems we are seeing large PR objects frequently over 1MB in size when serialized as JSON with nearly all the information in status pipelineSpec and taskruns. This might seem big now but I would bet this is really just the beginning and as we see more Tekton usage things are going to get a lot larger. In addition as a pipeline runs the PRs get updated along with changes to status.taskruns. This is triggering a huge amount of traffic for listening systems like the dashboard and our other systems that react to events in cluster. We're already hitting internal cluster limitations and have had to trim the PRs significantly to avoid overwhelming our messaging systems.

Looking at how the core Kubernetes API use status, it seems limited to very basic information and avoiding redundant information already captured in child or related objects. How painful would it be to re-work status to be lightweight?

@skaegi skaegi added the kind/bug Categorizes issue or PR as related to a bug. label Aug 26, 2020
@imjasonh
Copy link
Member

To me this seems to be closely related to #454 -- if we had an off-cluster system to store *Run statuses, we wouldn't be bound by etcd's storage and performance issues around l arge objects. (That being said, we would be bound by whatever that system's limitations are 😄 )

As an alternative, we could phase out reporting all of each sub-*Runs' statuses in the PipelineRun, and just require clients to look up all the constituent *Runs' statuses in order to display all that information. Or we could only report sub-Runs' .status.conditions to signal success/failure/ongoing, and stop reporting per-step information, timing, etc. These would need to be carefully communicated and staged slowly to avoid breaking clients that may rely on this information today.

@bobcatfish
Copy link
Collaborator

i believe the chains folks will have some thoughts here as well @dlorenc

@bobcatfish
Copy link
Collaborator

Also more info in #279

The PRs that made these changes are #2444 and #2489

In our production systems we are seeing large PR objects frequently over 1MB in size when serialized as JSON with nearly all the information in status pipelineSpec and taskruns.

one thought: maybe dereferencing all the taskruns into the pipelinerun is too far?

@imjasonh
Copy link
Member

one thought: maybe dereferencing all the taskruns into the pipelinerun is too far?

I think it makes sense to report each TaskRun's .status.conditions (success/failure) and results, and maybe a few more things, but per-step information might be overkill, and probably contributes the most to storage bloat. Updating the PipelineRun each time a step starts/finishes also means a lot more updates.

If Chains needs some complete record of each internal state I assume they could do the TaskRun lookups themselves, or consult a Results API which has done all the lookups, when that exists.

@bobcatfish bobcatfish added kind/design Categorizes issue or PR as related to design. and removed kind/bug Categorizes issue or PR as related to a bug. labels Sep 1, 2020
@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 30, 2020
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 30, 2020
@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ghost
Copy link

ghost commented May 7, 2021

/reopen

This still seems worth discussing, even if we end up deciding that keeping a copy of the exact pipeline + tasks in the PipelineRun's status is indeed what we want to do.

@tekton-robot
Copy link
Collaborator

@sbwsg: Reopened this issue.

In response to this:

/reopen

This still seems worth discussing, even if we end up deciding that keeping a copy of the exact pipeline + tasks in the PipelineRun's status is indeed what we want to do.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot reopened this May 7, 2021
@ghost ghost added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels May 7, 2021
@ghost
Copy link

ghost commented May 7, 2021

Freezing for future review.

@lbernick
Copy link
Member

/assign
/assign jerop

abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Mar 29, 2022
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Mar 29, 2022
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Mar 29, 2022
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Mar 31, 2022
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Apr 1, 2022
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Apr 4, 2022
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Apr 4, 2022
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Apr 6, 2022
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Apr 6, 2022
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Apr 6, 2022
…pelineRuns

See:
* https://github.com/tektoncd/community/blob/main/teps/0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
* tektoncd#4705
* tektoncd#4734
* tektoncd#3140

This implements TEP-0100, allowing for choosing between the original full embedded `TaskRun` and
`Run` statuses in `PipelineRun` statuses, minimal child references to the underlying `TaskRun` and
`Run`s, or both, building on top of the flags/fields/docs changes in tektoncd#4705 and the test changes in

Signed-off-by: Andrew Bayer <[email protected]>
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Apr 8, 2022
…pelineRuns

See:
* https://github.com/tektoncd/community/blob/main/teps/0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
* tektoncd#4705
* tektoncd#4734
* tektoncd#3140

This implements TEP-0100, allowing for choosing between the original full embedded `TaskRun` and
`Run` statuses in `PipelineRun` statuses, minimal child references to the underlying `TaskRun` and
`Run`s, or both, building on top of the flags/fields/docs changes in tektoncd#4705 and the test changes in

Signed-off-by: Andrew Bayer <[email protected]>
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Apr 11, 2022
…pelineRuns

See:
* https://github.com/tektoncd/community/blob/main/teps/0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
* tektoncd#4705
* tektoncd#4734
* tektoncd#3140

This implements TEP-0100, allowing for choosing between the original full embedded `TaskRun` and
`Run` statuses in `PipelineRun` statuses, minimal child references to the underlying `TaskRun` and
`Run`s, or both, building on top of the flags/fields/docs changes in tektoncd#4705 and the test changes in

Signed-off-by: Andrew Bayer <[email protected]>
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Apr 13, 2022
…d status

This builds on tektoncd#4694, tektoncd#4734, and tektoncd#4753. It will feed into a revamped tektoncd#4739, all as part
of https://github.com/tektoncd/community/blob/main/teps/0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
and tektoncd#3140.

Specifically, this adds functionality to `pkg/reconciler/pipelinerun/resources` in
`pipelinerunresolution.go` and `pipelinerunstate.go` which will be needed for the
full implementation. These changes won't have any effects in the current situation,
because `pr.Status.ChildReferences` is never populated, so can be made independently
of the rest of the implementation, thus also shrinking the size of the rest of the
implementation PR(s) for easier review.

Signed-off-by: Andrew Bayer <[email protected]>
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Apr 13, 2022
…d status

This builds on tektoncd#4694, tektoncd#4734, and tektoncd#4753. It will feed into a revamped tektoncd#4739, all as part
of https://github.com/tektoncd/community/blob/main/teps/0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
and tektoncd#3140.

Specifically, this adds functionality to `pkg/reconciler/pipelinerun/resources` in
`pipelinerunresolution.go` and `pipelinerunstate.go` which will be needed for the
full implementation. These changes won't have any effects in the current situation,
because `pr.Status.ChildReferences` is never populated, so can be made independently
of the rest of the implementation, thus also shrinking the size of the rest of the
implementation PR(s) for easier review.

Signed-off-by: Andrew Bayer <[email protected]>
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Apr 13, 2022
…pelineRuns

See:
* https://github.com/tektoncd/community/blob/main/teps/0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
* tektoncd#4705
* tektoncd#4734
* tektoncd#3140

This implements TEP-0100, allowing for choosing between the original full embedded `TaskRun` and
`Run` statuses in `PipelineRun` statuses, minimal child references to the underlying `TaskRun` and
`Run`s, or both, building on top of the flags/fields/docs changes in tektoncd#4705 and the test changes in

Signed-off-by: Andrew Bayer <[email protected]>
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Apr 13, 2022
…pelineRuns

See:
* https://github.com/tektoncd/community/blob/main/teps/0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
* tektoncd#4705
* tektoncd#4734
* tektoncd#3140

This implements TEP-0100, allowing for choosing between the original full embedded `TaskRun` and
`Run` statuses in `PipelineRun` statuses, minimal child references to the underlying `TaskRun` and
`Run`s, or both, building on top of the flags/fields/docs changes in tektoncd#4705 and the test changes in

Signed-off-by: Andrew Bayer <[email protected]>
tekton-robot pushed a commit that referenced this issue Apr 14, 2022
…d status

This builds on #4694, #4734, and #4753. It will feed into a revamped #4739, all as part
of https://github.com/tektoncd/community/blob/main/teps/0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
and #3140.

Specifically, this adds functionality to `pkg/reconciler/pipelinerun/resources` in
`pipelinerunresolution.go` and `pipelinerunstate.go` which will be needed for the
full implementation. These changes won't have any effects in the current situation,
because `pr.Status.ChildReferences` is never populated, so can be made independently
of the rest of the implementation, thus also shrinking the size of the rest of the
implementation PR(s) for easier review.

Signed-off-by: Andrew Bayer <[email protected]>
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Apr 14, 2022
…pelineRuns

See:
* https://github.com/tektoncd/community/blob/main/teps/0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
* tektoncd#4705
* tektoncd#4734
* tektoncd#3140

This implements TEP-0100, allowing for choosing between the original full embedded `TaskRun` and
`Run` statuses in `PipelineRun` statuses, minimal child references to the underlying `TaskRun` and
`Run`s, or both, building on top of the flags/fields/docs changes in tektoncd#4705 and the test changes in

Signed-off-by: Andrew Bayer <[email protected]>
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Apr 14, 2022
…pelineRuns

See:
* https://github.com/tektoncd/community/blob/main/teps/0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
* tektoncd#4705
* tektoncd#4734
* tektoncd#4753
* tektoncd#4757
* tektoncd#4760
* tektoncd#3140

This implements TEP-0100, allowing for choosing between the original full embedded `TaskRun` and
`Run` statuses in `PipelineRun` statuses, minimal child references to the underlying `TaskRun` and
`Run`s, or both, building on top of all the other PRs referenced above.

Signed-off-by: Andrew Bayer <[email protected]>
tekton-robot pushed a commit that referenced this issue Apr 15, 2022
…pelineRuns

See:
* https://github.com/tektoncd/community/blob/main/teps/0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
* #4705
* #4734
* #4753
* #4757
* #4760
* #3140

This implements TEP-0100, allowing for choosing between the original full embedded `TaskRun` and
`Run` statuses in `PipelineRun` statuses, minimal child references to the underlying `TaskRun` and
`Run`s, or both, building on top of all the other PRs referenced above.

Signed-off-by: Andrew Bayer <[email protected]>
@lbernick lbernick moved this from In Progress to Done in Pipelines V1 Apr 19, 2022
@lbernick
Copy link
Member

/close

TEP-0100 now implemented

@tekton-robot
Copy link
Collaborator

@lbernick: Closing this issue.

In response to this:

/close

TEP-0100 now implemented

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

chitrangpatel pushed a commit to chitrangpatel/pipeline that referenced this issue May 3, 2022
…d status

This builds on tektoncd#4694, tektoncd#4734, and tektoncd#4753. It will feed into a revamped tektoncd#4739, all as part
of https://github.com/tektoncd/community/blob/main/teps/0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
and tektoncd#3140.

Specifically, this adds functionality to `pkg/reconciler/pipelinerun/resources` in
`pipelinerunresolution.go` and `pipelinerunstate.go` which will be needed for the
full implementation. These changes won't have any effects in the current situation,
because `pr.Status.ChildReferences` is never populated, so can be made independently
of the rest of the implementation, thus also shrinking the size of the rest of the
implementation PR(s) for easier review.

Signed-off-by: Andrew Bayer <[email protected]>
chitrangpatel pushed a commit to chitrangpatel/pipeline that referenced this issue May 3, 2022
…pelineRuns

See:
* https://github.com/tektoncd/community/blob/main/teps/0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
* tektoncd#4705
* tektoncd#4734
* tektoncd#4753
* tektoncd#4757
* tektoncd#4760
* tektoncd#3140

This implements TEP-0100, allowing for choosing between the original full embedded `TaskRun` and
`Run` statuses in `PipelineRun` statuses, minimal child references to the underlying `TaskRun` and
`Run`s, or both, building on top of all the other PRs referenced above.

Signed-off-by: Andrew Bayer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Issues or PRs that are related to performance aspects. kind/design Categorizes issue or PR as related to design. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
Status: Done
Development

No branches or pull requests

8 participants