Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics-generator should filter out spans based upon attributes #1482

Closed
yvrhdn opened this issue Jun 9, 2022 · 10 comments · Fixed by #2274
Closed

Metrics-generator should filter out spans based upon attributes #1482

yvrhdn opened this issue Jun 9, 2022 · 10 comments · Fixed by #2274
Assignees
Labels
component/metrics-generator enhancement New feature or request keepalive Label to exempt Issues / PRs from stale workflow
Milestone

Comments

@yvrhdn
Copy link
Member

yvrhdn commented Jun 9, 2022

Is your feature request related to a problem? Please describe.
If the metrics-generator is enabled for a tenant, it will generate metrics for all traces that are ingested. In some cases it would be nice if we could only generate metrics for a subset of spans.

Example:

  • don't collect metrics for traces that don't have the attribute http.code
  • only generate metrics for spans from service XXX

Describe the solution you'd like
It should be possible to configure filters that include/exclude spans based upon their attributes.

Describe alternatives you've considered
There is no alternative AFAIK. You could drop the metrics when remote writing them, but that is definitely not ideal.

Additional context

@yvrhdn yvrhdn changed the title Exclude spans from metrics-generator Metrics-generator should filter out spans based upon attributes Jun 27, 2022
@joe-elliott joe-elliott added this to the Next milestone Jul 12, 2022
@faustodavid
Copy link
Contributor

What do you think of having filters a bit similar to the OpenTelemetry Collector but just for the metrics processors?

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/internal/coreinternal/processor/filterspan/filterspan.go

It will allow multiple includes or excludes filters that can be defined using regexp or strict match type.

metrics_generator:
  processor:
    span_metrics:
      dimensions:
        - http.method
        - http.host
      span_filters:
        include:
          - match_type: regexp
            attributes:
              - key: http.method
                value: "GET|POST|PUT|DELETE|HEAD|OPTIONS|TRACE"
        exclude:
         - match_type: strict
           attributes:
            - key: http.host
              value: "some-host-name"

@yvrhdn
Copy link
Member Author

yvrhdn commented Jul 18, 2022

Yeah, that looks like a good reference! We won't be able to vendor the OTel processor directly, but we can mirror the structure of the configuration.

For a first implementation I'd like to keep it as simple as possible (just focus on the minimal features required to make this useful) so our initial implementation might be more limited.

@09jvilla
Copy link
Contributor

09jvilla commented Nov 4, 2022

Does this help decrease the count of series produced by the metrics generator and therefore does it have cost management applications? Or is it more about just getting metrics about the services or cases you actually care about without having those numbers skewed by the other ones?

@ie-pham
Copy link
Contributor

ie-pham commented Nov 4, 2022

If the filters are applied in a way that would reduce cardinality in the metrics produced by the Metrics Generator then potentially there could be a cost reduction.

@09jvilla
Copy link
Contributor

09jvilla commented Nov 4, 2022

Based on your response, though, it sounds like that is not the core use case or purpose of doing this? Its more like a nice secondary benefit?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 4, 2023

This issue has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity.
Please apply keepalive label to exempt this Issue.

@github-actions github-actions bot added the stale Used for stale issues / PRs label Jan 4, 2023
@joe-elliott joe-elliott added enhancement New feature or request keepalive Label to exempt Issues / PRs from stale workflow and removed stale Used for stale issues / PRs labels Jan 4, 2023
@NickAdolf
Copy link

This filtering is both to declutter metrics that won't actually be used, but also a cost savings for us.

As we look to broaden our OTel usage, including Faro the metrics generator is required to really harness the value but it will put extreme pressure on active series counts. This is the biggest impact for us, as a Grafana Cloud customer. The previously proposed solution would more than suffice for my particular use case.

@lmickh
Copy link

lmickh commented Feb 9, 2023

I think there is a benefit to filtering out the traces before then get sent to the metrics generator, but that might be out of the scope of this issue.

I have a use-case where we receive ~5TB trace data per day, but only want to generate metrics on a specific list of services. It would be great if the distributors could just not send that to the metrics generator and reduce the daily network costs in addition to the cpu/memory consumption of the metrics generator.

This could probably be done in my use-case by splitting Tempo tenants, but then I've got to push that logic down to the otel collector and deal with different services using different tenants even though they are part of the same stack that telemetry data gets referenced across.

It would be nice if the distributor could use a limit set of dimensions to filter with even if it was only tenant and an exact match on the service.name attribute.

@zalegrala zalegrala moved this from Todo to Next in Tempo squad Mar 1, 2023
@zalegrala zalegrala self-assigned this Mar 1, 2023
@zalegrala zalegrala moved this from Next to In Progress in Tempo squad Mar 28, 2023
@zalegrala
Copy link
Contributor

I've got a PR up to approach solving this issue. Review/testing welcome. I'd like to get this running somewhere and put it through the paces, but right now I have reasonable indication this is doing what we've want here based on the tests.

To the notion of filtering at the distributor, I think this is a great idea, but is more complicated due to the nature of the how the distributor works. The best approach to that might be to filter at the head using the agent or similar approach, so the distributor never receives the spans. But perhaps you are wanting to only filter out metrics generation and not the actual ingest of the spans. It might be worth a separate issue about how we could think about how we could approach that issue.

It came up during review with the team, that it might also be nice to filter based on TraceQL, given the amount of effort we're putting into the language. If we could tie into that area of the code, we may be able to re-use some filtering capabilities there, but I suspect this will be a follow-up issue.

@zalegrala
Copy link
Contributor

Note, any testers will want to read the docs that are also included in #2274.

@zalegrala zalegrala moved this from In Progress to In Review in Tempo squad Apr 27, 2023
@github-project-automation github-project-automation bot moved this from In Review to Done in Tempo squad May 2, 2023
@ie-pham ie-pham moved this from Done to Archived in Tempo squad May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/metrics-generator enhancement New feature or request keepalive Label to exempt Issues / PRs from stale workflow
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

8 participants