Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline metrics reporting incorrect values #3029

Closed
rlandesman opened this issue Jul 29, 2020 · 7 comments
Closed

Pipeline metrics reporting incorrect values #3029

rlandesman opened this issue Jul 29, 2020 · 7 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@rlandesman
Copy link
Contributor

Expected Behavior

In Tekton CLI using tkn tr list I am seeing hundreds of successful TaskRuns, each triggered by a demo pipeline I have setup. In running the promQL query tekton_taskrun_count the correct number of successful/failed TaskRuns shows, and I expect that running tekton_taskrun_duration_seconds_countwould show the duration of each of these TaskRuns

Screen Shot 2020-07-29 at 8 28 13 AM

Actual Behavior

The promQL query tekton_taskrun_duration_seconds_count results in just 5 (failed) TaskRuns. Furthermore, in attempting to parse out the data using the 'tekton_pipelinerun_taskrun_duration_seconds_count' label, I am seeing unusual resultant times for TaskRuns.

Screen Shot 2020-07-29 at 8 25 34 AM

Steps to Reproduce the Problem

  1. Run a pipeline with multiple tasks to populate TaskRuns
  2. Query the data using promQL

Additional Info

I'm rather new to Tekton/Open Source so there's a chance I'm just missing something here but as of now this looks like a bug to me. Would love to help out in anyway I can!

  • Kubernetes version:

    Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T23:35:15Z", GoVersion:"go1.14.2", Compiler:"gc", Platform:"darwin/amd64"}

Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.7", GitCommit:"b4455102ef392bf7d594ef96b97a4caa79d729d9", GitTreeState:"clean", BuildDate:"2020-06-17T11:32:20Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • Tekton Pipeline version:

    Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

Client version: 0.10.0
Pipeline version: v0.14.1
Triggers version: unknown
@rlandesman rlandesman added the kind/bug Categorizes issue or PR as related to a bug. label Jul 29, 2020
@pritidesai
Copy link
Member

Thanks @rlandesman, we have similar issue reported for tekton_pipelinerun_count in #2844

@vincent-pli
Copy link
Member

@pritidesai
The #2844 is about number of success taskrun/pipelinerun double.
But this issue is about failed objects.
The root cause is when a taskrun failed, we do not forgot it in work queue, then the failed taskrun will reconcile again and again and the status is failed means done, then the metric will count again and again. so the value of the metrics is bigger than it should be.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 29, 2020
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Nov 28, 2020
@pritidesai
Copy link
Member

/remove-lifecycle rotten

@rlandesman you still experiencing this issue?

@rlandesman
Copy link
Contributor Author

Not a huge issue for me atm, I think it's fair to remove this from priority until it comes up again

@pritidesai
Copy link
Member

Closing this for now, we will revisit in future if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants