-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(spans): Use gauges to report self and total time to lower costs #3448
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea! This will pretty much double the amount of buckets we send between Relays and to Kafka. Can we put the new metrics behind a feature flag?
relay-dynamic-config/src/defaults.rs
Outdated
}, | ||
MetricSpec { | ||
category: DataCategory::Span, | ||
mri: "g:spans/self_time@millisecond".into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we stick with exclusive_time
and duration
instead of self_time
and total_time
? Or would that cause naming clashes in some downstream component that ignores the type prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can stick with them but we do rename them to total_time
and self_time
in the UI. I thought that was a good occasion to stop having to do that.
Any reason why not to change the names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the purpose of this PR to experiment with what the product would look like without distributions, or is it already decided and we now need the double write for the transition period?
If it's just for experimenting, I don't think we actually need to record a gauge metric to see what the product would look like without distributions. Distributions are (almost) a superset of gauges, so we should be able to query min
, max
, sum
, and count
from the distributions table just as easily as from the gauges table.
This is not to experiment, this is to "downgrade" the metric to lower its cost since we only use averages and a solution with percentiles won't come by improving distributions. Gauges are cheaper than distributions to store. |
]; | ||
|
||
if double_write_distributions_as_gauges { | ||
metrics.append(&mut vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metrics.append(&mut vec![ | |
metrics.extend([ |
We're currently storing self and total times for spans as distribution metrics. Those store a lot of information in order to be flexible enough to calculate percentiles.
Since querying percentiles has been too slow to display on any screen, we decided not to do it this way and use a different method. It's not necessary to keep storing them as distributions and we could use gauges instead to lower our cost.
The plan is to add those new gauge metrics, record them for a while, add support to query them in the product behind a feature flag and switch to them when we're happy with the result.