Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP-0006: New (sub-)metrics #146

Merged
merged 1 commit into from
Sep 2, 2020
Merged

Conversation

NavidZ
Copy link
Member

@NavidZ NavidZ commented Jul 13, 2020

This is a barebone TEP for adding new metrics as
suggested by the TEP template. At this point it
has only the problem statement and a testing plan.
I will iterate more to add the actual (sub-)metrics
and what they measure and where should they be
measured in the code in subsequent PRs.

tektoncd/pipeline#2814

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 13, 2020
@NavidZ
Copy link
Member Author

NavidZ commented Jul 14, 2020

/assign afrittoli

@mrutkows
Copy link
Member

I believe that moving to OpenTelemetry should be a goal if we are attempting to put metrics on the correct path for the future as that is the direction of OpenCensus.

@NavidZ
Copy link
Member Author

NavidZ commented Jul 14, 2020

I believe that moving to OpenTelemetry should be a goal if we are attempting to put metrics on the correct path for the future as that is the direction of OpenCensus.

I agree. But at the moment we use the Knative reporting libraries. We could alternatively wait for them to migrate and just rely on their migration as I mentioned in this comment. What do you think?

@mrutkows
Copy link
Member

mrutkows commented Jul 15, 2020

I agree. But at the moment we use the Knative reporting libraries. We could alternatively wait for them to migrate and just rely on their migration as I mentioned in this comment. What do you think?

I would assume that it would be the goal of Tekton to have its own metrics/reporting (exporter/observability config) code and not use Knative's in order to have the ability to move independently of Knative. If Knative were a dedicated monitoring/metrics project that intended other projects to directly reuse its metrics-related libraries, I would feel differently, but instead believe that Knative's mission is not to provide their metrics libs. as first-class, supported code for downstream projects to use and rely upon.

@NavidZ
Copy link
Member Author

NavidZ commented Jul 15, 2020

I would assume that it would be the goal of Tekton to have its own metrics/reporting (exporter/observability config) code and not use Knative's in order to have the ability to move independently of Knative. If Knative were a dedicated monitoring/metrics project that intended other projects to directly reuse its metrics-related libraries, I would feel differently, but instead believe that Knative's mission is not to provide their metrics libs. as first-class, supported code for downstream projects to use and rely upon.

That's a very good point. I now am more inclined to indeed have our own reporting system after your comment and decouple it from the Knative at some point. One quick problem is that OpenTelemetry APIs are in alpha (and recently some in beta). So I'm not sure it is at the stage that we accept using in this project as is. I defer this decision to you and others (cc @afrittoli @vdemeester) who have more experience and I will follow your guidance. Let me know.

@vdemeester
Copy link
Member

I would assume that it would be the goal of Tekton to have its own metrics/reporting (exporter/observability config) code and not use Knative's in order to have the ability to move independently of Knative. If Knative were a dedicated monitoring/metrics project that intended other projects to directly reuse its metrics-related libraries, I would feel differently, but instead believe that Knative's mission is not to provide their metrics libs. as first-class, supported code for downstream projects to use and rely upon.

So I am not fully sure I agree with that. We are using knative.dev/pkg as a library (as knative component does), and it provides some metrics capabilities. From my perspective, anything that is in knative.dev/pkg is meant to be used to knative components and any other project that wants to benefit from it.

Knative's mission is not to provide their metrics libs. as first-class, supported code for downstream projects to use and rely upon.

That's a good question 🙃. I see knative.dev/pkg as exactly this : library that can be used for downstream projects 🙃 (cc @mattmoor )

@mattmoor
Copy link
Member

cc @evankanderson has been poking at OpenCensus + OpenTelemetry stuff in the context of knative/pkg.

@evankanderson
Copy link

My suggestion would be to use the upstream opencensus libraries directly with the opencensus collector, and plan at some point to move to opentelemetry when their metrics support is production quality.

The knative/pkg metrics package as currently implemented addresses a number of problems which Tekton probably doesn't need to handle. (The one part which may be useful is the support for exporting metrics for multiple Resources from a single process. If that is useful, it may be better to upstream it.) The capabilities that make the package a poor fit are:

  • There's a lot of code for dynamically changing the reporting backend at runtime. This is a lot of code compared with the alternative of restarting the binary (or the opencensus collector) if you switch from one metrics system to another.
  • The pkg library only supports three backends: Stackdriver (with a bunch of Knative-specific overrides), Prometheus, and OpenCensus. The collector supports many more with a simpler config model.
  • Most of the library is heavily entwined with the Knative config-observability ConfigMap, and the external interfaces have not been carefully designed.

@NavidZ
Copy link
Member Author

NavidZ commented Jul 17, 2020

@vdemeester @mrutkows based on what Evan said are we okay at least with the (non-)goals and other sections so far?

@NavidZ
Copy link
Member Author

NavidZ commented Aug 4, 2020

/assign vdemeester

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on what Evan said are we okay at least with the (non-)goals and other sections so far?

@NavidZ I think we are. Let's focus this TEP around the metrics we want to gather/report 👍


-->

# TEP-NNNN: More granular metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to assign a number 👼

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I picked 0006 as that doesn't seem to exist yet in the repo.

This is a barebone TEP for adding new metrics as
suggested by the TEP template. At this point it
has only the summary, motivation, and a testing plan
to get aggreement on the general path.

I will iterate more to add the proposed submetrics
and what they measure and where should they be
measured in the code in subsequent PRs.
@NavidZ
Copy link
Member Author

NavidZ commented Aug 4, 2020

/assign @afrittoli

@NavidZ
Copy link
Member Author

NavidZ commented Aug 13, 2020

ping @afrittoli

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me as a starting point.
I like the idea of describing the general intention in the 'proposed' state and add the bulk of the content in the next iteration.
/lgtm

/cc @vdemeester @imjasonh @hrishin @mrutkows

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 13, 2020
@tekton-robot
Copy link
Contributor

@afrittoli: GitHub didn't allow me to request PR reviews from the following users: mrutkows.

Note that only tektoncd members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

This looks good to me as a starting point.
I like the idea of describing the general intention in the 'proposed' state and add the bulk of the content in the next iteration.
/lgtm

/cc @vdemeester @imjasonh @hrishin @mrutkows

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@afrittoli
Copy link
Member

/hold
There are already two trigger TEPs which are competing for #6 - @NavidZ would you mind picking a different number and adding it to the PR title too so that no-one else takes it?

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 13, 2020
@afrittoli
Copy link
Member

About using knative/pkg I would not say that moving off that dependency is the primary goal of this TEP, but it will probably happen as part of the technical design of the solution.

Something worth noting is that some part of the controller code are generated usingknative/pkg - and those include instrumentation with metrics about the controller queue. It would be good to have those available even if we decided to not depend on knative/pkg for Tekton own metrics.

@NavidZ
Copy link
Member Author

NavidZ commented Aug 13, 2020

/hold
There are already two trigger TEPs which are competing for #6 - @NavidZ would you mind picking a different number and adding it to the PR title too so that no-one else takes it?

The trigger ones took 8 and 9 not 6. Do you still want me to change this to say TEP0010?

@NavidZ NavidZ changed the title Add TEP for new (sub-)metrics TEP-0006: New (sub-)metrics Aug 13, 2020
@afrittoli
Copy link
Member

/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 17, 2020
@pritidesai
Copy link
Member

/kind tep

@tekton-robot tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Aug 20, 2020
@pritidesai
Copy link
Member

pritidesai commented Aug 24, 2020

Please let us know the status and any blockers? @NavidZ

We are looking for examples and types of metrics, thanks

@NavidZ
Copy link
Member Author

NavidZ commented Aug 24, 2020

@pritidesai do we want to land this and I'll follow up with more PRs regarding the details?

@NavidZ NavidZ requested a review from afrittoli August 24, 2020 17:17
@bobcatfish
Copy link
Contributor

Had a brief discussion in the API working group today, @NavidZ suggested merging this with less details fleshed out as proposed, then incrementally adding more; we'd want to all be satisfied with the level of detail before moving to implmentable

In the meeting @imjasonh took an action to take a look as well

@imjasonh
Copy link
Member

imjasonh commented Sep 1, 2020

Sorry for the delay in getting to this.

This TEP as proposed lgtm, as others have said we'll definitely want to add more detail in future revisions before we get to implementable, but in general the overall idea of "we need better metrics" is pretty universally agreed upon AFAIK.

/lgtm

@afrittoli
Copy link
Member

/approve

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 2, 2020
@tekton-robot tekton-robot merged commit fa12473 into tektoncd:master Sep 2, 2020
dlorenc pushed a commit to dlorenc/community that referenced this pull request Oct 27, 2022
Signed-off-by: Meredith Lancaster <[email protected]>

Signed-off-by: Meredith Lancaster <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants