-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error count "failed" taskrun for metric of Prometheus #3045
Comments
thanks @vincent-pli Yup, once it enters I am debugging into metrics. When I ran a sample invalid TaskRun, Controller log file has
And each TaskRun status has exact same status:
I am not able to reproduce the metrics count yet, will give it a one more try (dont have Prometheus setup on my local cluster). |
I am debugging this with:
|
But like you mentioned, in case of |
@pritidesai , so you reproduce it? I means in section: pipeline/pkg/reconciler/taskrun/taskrun.go Line 124 in 28d1347
The error is normal error, will cause the issue. |
/assign |
Yes, it raises error with
After raising The following unit test also does not catch this, it verifies the Permanent Error but does not validate whether the taskrun is not requeued again, will add that test: pipeline/pkg/reconciler/taskrun/taskrun_test.go Lines 1698 to 1705 in cb85f1d
I have changes in place, will create a PR. |
Expected Behavior
Count correct the number
Actual Behavior
Incorrect
Steps to Reproduce the Problem
Create a
taskrun
which cannot findtask
refer to.check the
tekton_taskrun_count{status="failed"}
Additional Info
When such exception happened,
pipeline/pkg/reconciler/taskrun/taskrun.go
Lines 240 to 245 in c185296
a
PermanentError
was raised, theknative/pkg
willforget
thetaskrun
fromworkqueue
, I guess that we expected.but we also update the
.status
, that's will enqueue anothertaskrun
object (same but with different .status), and it will enter :pipeline/pkg/reconciler/taskrun/taskrun.go
Lines 109 to 110 in c185296
directly, then count as
tekton_taskrun_count{status="failed"}
if any error in this section (if tr.IsDone()
), a normal error will be raised, that's means thetaskrun
will re-enqueue again and again based on implements of this:https://github.com/kubernetes/client-go/blob/00dbcca6ee44c678754d3f5fda1bd0e704b26fe2/util/workqueue/default_rate_limiters.go#L89
A candidate solution is wrapper all error in this section as
PermanentError
to let workqueue forget it.The text was updated successfully, but these errors were encountered: