TEP-0006: New (sub-)metrics #146

NavidZ · 2020-07-13T18:07:40Z

This is a barebone TEP for adding new metrics as
suggested by the TEP template. At this point it
has only the problem statement and a testing plan.
I will iterate more to add the actual (sub-)metrics
and what they measure and where should they be
measured in the code in subsequent PRs.

tektoncd/pipeline#2814

NavidZ · 2020-07-14T14:22:52Z

/assign afrittoli

mrutkows · 2020-07-14T16:35:21Z

I believe that moving to OpenTelemetry should be a goal if we are attempting to put metrics on the correct path for the future as that is the direction of OpenCensus.

NavidZ · 2020-07-14T16:39:53Z

I believe that moving to OpenTelemetry should be a goal if we are attempting to put metrics on the correct path for the future as that is the direction of OpenCensus.

I agree. But at the moment we use the Knative reporting libraries. We could alternatively wait for them to migrate and just rely on their migration as I mentioned in this comment. What do you think?

mrutkows · 2020-07-15T14:53:58Z

I agree. But at the moment we use the Knative reporting libraries. We could alternatively wait for them to migrate and just rely on their migration as I mentioned in this comment. What do you think?

I would assume that it would be the goal of Tekton to have its own metrics/reporting (exporter/observability config) code and not use Knative's in order to have the ability to move independently of Knative. If Knative were a dedicated monitoring/metrics project that intended other projects to directly reuse its metrics-related libraries, I would feel differently, but instead believe that Knative's mission is not to provide their metrics libs. as first-class, supported code for downstream projects to use and rely upon.

NavidZ · 2020-07-15T17:27:23Z

I would assume that it would be the goal of Tekton to have its own metrics/reporting (exporter/observability config) code and not use Knative's in order to have the ability to move independently of Knative. If Knative were a dedicated monitoring/metrics project that intended other projects to directly reuse its metrics-related libraries, I would feel differently, but instead believe that Knative's mission is not to provide their metrics libs. as first-class, supported code for downstream projects to use and rely upon.

That's a very good point. I now am more inclined to indeed have our own reporting system after your comment and decouple it from the Knative at some point. One quick problem is that OpenTelemetry APIs are in alpha (and recently some in beta). So I'm not sure it is at the stage that we accept using in this project as is. I defer this decision to you and others (cc @afrittoli @vdemeester) who have more experience and I will follow your guidance. Let me know.

vdemeester · 2020-07-16T09:27:24Z

I would assume that it would be the goal of Tekton to have its own metrics/reporting (exporter/observability config) code and not use Knative's in order to have the ability to move independently of Knative. If Knative were a dedicated monitoring/metrics project that intended other projects to directly reuse its metrics-related libraries, I would feel differently, but instead believe that Knative's mission is not to provide their metrics libs. as first-class, supported code for downstream projects to use and rely upon.

So I am not fully sure I agree with that. We are using knative.dev/pkg as a library (as knative component does), and it provides some metrics capabilities. From my perspective, anything that is in knative.dev/pkg is meant to be used to knative components and any other project that wants to benefit from it.

Knative's mission is not to provide their metrics libs. as first-class, supported code for downstream projects to use and rely upon.

That's a good question 🙃. I see knative.dev/pkg as exactly this : library that can be used for downstream projects 🙃 (cc @mattmoor )

mattmoor · 2020-07-16T15:23:41Z

cc @evankanderson has been poking at OpenCensus + OpenTelemetry stuff in the context of knative/pkg.

evankanderson · 2020-07-17T03:17:44Z

My suggestion would be to use the upstream opencensus libraries directly with the opencensus collector, and plan at some point to move to opentelemetry when their metrics support is production quality.

The knative/pkg metrics package as currently implemented addresses a number of problems which Tekton probably doesn't need to handle. (The one part which may be useful is the support for exporting metrics for multiple Resources from a single process. If that is useful, it may be better to upstream it.) The capabilities that make the package a poor fit are:

There's a lot of code for dynamically changing the reporting backend at runtime. This is a lot of code compared with the alternative of restarting the binary (or the opencensus collector) if you switch from one metrics system to another.
The pkg library only supports three backends: Stackdriver (with a bunch of Knative-specific overrides), Prometheus, and OpenCensus. The collector supports many more with a simpler config model.
Most of the library is heavily entwined with the Knative config-observability ConfigMap, and the external interfaces have not been carefully designed.

NavidZ · 2020-07-17T13:36:58Z

@vdemeester @mrutkows based on what Evan said are we okay at least with the (non-)goals and other sections so far?

NavidZ · 2020-08-04T16:16:15Z

/assign vdemeester

vdemeester

based on what Evan said are we okay at least with the (non-)goals and other sections so far?

@NavidZ I think we are. Let's focus this TEP around the metrics we want to gather/report 👍

vdemeester · 2020-08-04T16:16:06Z

teps/000X-tekton-metrics.md

+
+-->
+
+# TEP-NNNN: More granular metrics


We will need to assign a number 👼

Done. I picked 0006 as that doesn't seem to exist yet in the repo.

This is a barebone TEP for adding new metrics as suggested by the TEP template. At this point it has only the summary, motivation, and a testing plan to get aggreement on the general path. I will iterate more to add the proposed submetrics and what they measure and where should they be measured in the code in subsequent PRs.

NavidZ · 2020-08-04T17:48:25Z

/assign @afrittoli

NavidZ · 2020-08-13T15:30:55Z

ping @afrittoli

afrittoli

This looks good to me as a starting point.
I like the idea of describing the general intention in the 'proposed' state and add the bulk of the content in the next iteration.
/lgtm

/cc @vdemeester @imjasonh @hrishin @mrutkows

tekton-robot · 2020-08-13T15:44:24Z

@afrittoli: GitHub didn't allow me to request PR reviews from the following users: mrutkows.

Note that only tektoncd members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

This looks good to me as a starting point.
I like the idea of describing the general intention in the 'proposed' state and add the bulk of the content in the next iteration.
/lgtm

/cc @vdemeester @imjasonh @hrishin @mrutkows

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

afrittoli · 2020-08-13T15:46:40Z

/hold
There are already two trigger TEPs which are competing for #6 - @NavidZ would you mind picking a different number and adding it to the PR title too so that no-one else takes it?

afrittoli · 2020-08-13T15:52:51Z

About using knative/pkg I would not say that moving off that dependency is the primary goal of this TEP, but it will probably happen as part of the technical design of the solution.

Something worth noting is that some part of the controller code are generated usingknative/pkg - and those include instrumentation with metrics about the controller queue. It would be good to have those available even if we decided to not depend on knative/pkg for Tekton own metrics.

NavidZ · 2020-08-13T17:05:11Z

/hold
There are already two trigger TEPs which are competing for #6 - @NavidZ would you mind picking a different number and adding it to the PR title too so that no-one else takes it?

The trigger ones took 8 and 9 not 6. Do you still want me to change this to say TEP0010?

afrittoli · 2020-08-17T16:08:41Z

/hold cancel

pritidesai · 2020-08-20T17:38:49Z

/kind tep

pritidesai · 2020-08-24T16:04:35Z

Please let us know the status and any blockers? @NavidZ

We are looking for examples and types of metrics, thanks

NavidZ · 2020-08-24T17:16:39Z

@pritidesai do we want to land this and I'll follow up with more PRs regarding the details?

bobcatfish · 2020-08-31T16:14:35Z

Had a brief discussion in the API working group today, @NavidZ suggested merging this with less details fleshed out as proposed, then incrementally adding more; we'd want to all be satisfied with the level of detail before moving to implmentable

In the meeting @imjasonh took an action to take a look as well

imjasonh · 2020-09-01T16:28:46Z

Sorry for the delay in getting to this.

This TEP as proposed lgtm, as others have said we'll definitely want to add more detail in future revisions before we get to implementable, but in general the overall idea of "we need better metrics" is pretty universally agreed upon AFAIK.

/lgtm

afrittoli · 2020-09-02T14:24:31Z

/approve

tekton-robot · 2020-09-02T14:24:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~teps/OWNERS~~ [afrittoli]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Meredith Lancaster <[email protected]> Signed-off-by: Meredith Lancaster <[email protected]>

tekton-robot requested review from afrittoli and kimsterv July 13, 2020 18:07

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 13, 2020

tekton-robot assigned afrittoli Jul 14, 2020

tekton-robot assigned vdemeester Aug 4, 2020

vdemeester reviewed Aug 4, 2020

View reviewed changes

NavidZ force-pushed the metrics-tep branch from 2199a87 to 7084c14 Compare August 4, 2020 16:25

afrittoli reviewed Aug 13, 2020

View reviewed changes

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 13, 2020

tekton-robot requested review from vdemeester, imjasonh and hrishin August 13, 2020 15:44

tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 13, 2020

NavidZ changed the title ~~Add TEP for new (sub-)metrics~~ TEP-0006: New (sub-)metrics Aug 13, 2020

tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 17, 2020

tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Aug 20, 2020

NavidZ requested a review from afrittoli August 24, 2020 17:17

tekton-robot assigned imjasonh Sep 1, 2020

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 2, 2020

tekton-robot merged commit fa12473 into tektoncd:master Sep 2, 2020

dlorenc pushed a commit to dlorenc/community that referenced this pull request Oct 27, 2022

add malancas to oncall team (tektoncd#146)

132a50c

Signed-off-by: Meredith Lancaster <[email protected]> Signed-off-by: Meredith Lancaster <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TEP-0006: New (sub-)metrics #146

TEP-0006: New (sub-)metrics #146

NavidZ commented Jul 13, 2020

NavidZ commented Jul 14, 2020

mrutkows commented Jul 14, 2020

NavidZ commented Jul 14, 2020

mrutkows commented Jul 15, 2020 •

edited

Loading

NavidZ commented Jul 15, 2020

vdemeester commented Jul 16, 2020

mattmoor commented Jul 16, 2020

evankanderson commented Jul 17, 2020

NavidZ commented Jul 17, 2020

NavidZ commented Aug 4, 2020

vdemeester left a comment

vdemeester Aug 4, 2020

NavidZ Aug 4, 2020

NavidZ commented Aug 4, 2020

NavidZ commented Aug 13, 2020

afrittoli left a comment

tekton-robot commented Aug 13, 2020

afrittoli commented Aug 13, 2020

afrittoli commented Aug 13, 2020

NavidZ commented Aug 13, 2020

afrittoli commented Aug 17, 2020

pritidesai commented Aug 20, 2020

pritidesai commented Aug 24, 2020 •

edited

Loading

NavidZ commented Aug 24, 2020

bobcatfish commented Aug 31, 2020

imjasonh commented Sep 1, 2020

afrittoli commented Sep 2, 2020

tekton-robot commented Sep 2, 2020

TEP-0006: New (sub-)metrics #146

TEP-0006: New (sub-)metrics #146

Conversation

NavidZ commented Jul 13, 2020

NavidZ commented Jul 14, 2020

mrutkows commented Jul 14, 2020

NavidZ commented Jul 14, 2020

mrutkows commented Jul 15, 2020 • edited Loading

NavidZ commented Jul 15, 2020

vdemeester commented Jul 16, 2020

mattmoor commented Jul 16, 2020

evankanderson commented Jul 17, 2020

NavidZ commented Jul 17, 2020

NavidZ commented Aug 4, 2020

vdemeester left a comment

Choose a reason for hiding this comment

vdemeester Aug 4, 2020

Choose a reason for hiding this comment

NavidZ Aug 4, 2020

Choose a reason for hiding this comment

NavidZ commented Aug 4, 2020

NavidZ commented Aug 13, 2020

afrittoli left a comment

Choose a reason for hiding this comment

tekton-robot commented Aug 13, 2020

afrittoli commented Aug 13, 2020

afrittoli commented Aug 13, 2020

NavidZ commented Aug 13, 2020

afrittoli commented Aug 17, 2020

pritidesai commented Aug 20, 2020

pritidesai commented Aug 24, 2020 • edited Loading

NavidZ commented Aug 24, 2020

bobcatfish commented Aug 31, 2020

imjasonh commented Sep 1, 2020

afrittoli commented Sep 2, 2020

tekton-robot commented Sep 2, 2020

mrutkows commented Jul 15, 2020 •

edited

Loading

pritidesai commented Aug 24, 2020 •

edited

Loading