-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[metrics-generator] filter out spans based on policy #2274
Conversation
Thank you for adding documentation! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs look good! Thank you for adding them.
bf1a64f
to
dcdd76f
Compare
bca1054
to
126f390
Compare
spanMetricsCallsTotal registry.Counter | ||
spanMetricsDurationSeconds registry.Histogram | ||
spanMetricsSizeTotal registry.Counter | ||
spanMetricsFilterDropsTotal registry.Counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular reason we're pushing this and not just recording it as a normal prometheus metric in tempo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, good call out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spoke too soon. For a generator instance, we don't have the ID available currently, so to create a metric where we include the tenant label isn't feasible without a bit of refactor. This currently pushes the metric to the remote endpoint and would increase the series count, so perhaps this isn't something we want to do. Though, if we host the metric on the generator itself, then we'd have access to know which tenants would be filtering which spans, but likely this wouldn't have the desired value for folks who don't have access to those metrics. I.e. if a cloud user has a filter, they wouldn't be able to see how many spans are being rejected by their filter, which seems like the primary utility. I'm a little torn on even including this, but it does seem like we want some indication that the spans are being filtered out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i really think we shouldn't be pushing this using remote write. the metrics we push are the generated ones. this describes the operational state of tempo which would just publish normally.
if a cloud user has a filter, they wouldn't be able to see how many spans are being rejected by their filter, which seems like the primary utility. I'm a little torn on even including this, but it does seem like we want some indication that the spans are being filtered out.
we can push this back through billing and expose it to the end user. i think we may just need to push the tenant id down into the instance. alternatively you can push a counter metric into the instance that already has the tenant id configured
} | ||
|
||
for _, policy := range p.filterPolicies { | ||
if policy.Include != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could move the policy.include check into policyMatch()
which would clean this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what policyMatch()
should do in the case of a nil policy. I agree it would be a little cleaner, but include is the inverse of exclude here, so for example returning a true
value from policyMatch()
isn't as clean. Perhaps I can make exclude and include behave the same here, and then make the suggested adjustment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one broader question that just occurred to me: should all of this logic apply to both span metrics and service graph metrics? should we move this up a level?
Thanks for the review @joe-elliott. If we move this up a level, do you suppose the processors should have an independent filter config, or share one? Sharing one would be simpler and likely more performant, but I wonder if folks would want to tune them independently. |
In lieu of an specific idea about how widely to apply the filtering, I'm going to restructure the span filtering into |
b2df1e4
to
6e15687
Compare
ebced35
to
7fb6e5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really liking the refactor to pull out the filters. Some thoughts but this is close
spanMetricsCallsTotal registry.Counter | ||
spanMetricsDurationSeconds registry.Histogram | ||
spanMetricsSizeTotal registry.Counter | ||
spanMetricsFilterDropsTotal registry.Counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i really think we shouldn't be pushing this using remote write. the metrics we push are the generated ones. this describes the operational state of tempo which would just publish normally.
if a cloud user has a filter, they wouldn't be able to see how many spans are being rejected by their filter, which seems like the primary utility. I'm a little torn on even including this, but it does seem like we want some indication that the spans are being filtered out.
we can push this back through billing and expose it to the end user. i think we may just need to push the tenant id down into the instance. alternatively you can push a counter metric into the instance that already has the tenant id configured
pkg/spanfilter/spanfilter.go
Outdated
} | ||
matches++ | ||
case traceql.IntrinsicKind: | ||
if !stringMatch(policy.MatchType, span.GetKind().String(), pa.Value.(string)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyway we can do int matches here and on status? would be much faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably yes, but it might be a little clumsy, since we'd need to take this as an int in the config I think. There is a special yaml struct tag we could use iirc to parse a string as an int, or perhaps some custom unmarshaling. For now I'm inclined to leave it and come back to it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can work this. I'll take a closer look next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a commit for this. It made some parts less readable, but seems okay to me. How's that look?
5214024
to
95fd22e
Compare
…nt overrides Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Co-authored-by: Joe Elliott <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
1cec33b
to
79151bb
Compare
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
Signed-off-by: Zach Leslie <[email protected]>
What this PR does:
Here we implement an approach to filtering out spans based on a policy, loosely based around the OTEL collector filterspan config format.
Which issue(s) this PR fixes:
Fixes #1482
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]