-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Traceql Metrics] PR 1 - Engine #3251
Conversation
@@ -23,7 +33,8 @@ type typedExpression interface { | |||
} | |||
|
|||
type RootExpr struct { | |||
Pipeline Pipeline | |||
Pipeline Pipeline | |||
MetricsPipeline metricsFirstStageElement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the metrics pipeline split from the spanset filter pipeline isn't ideal, but best for now. We send the eval callback of the spanset pipeline into the storage SecondPass, and the metrics pipeline has a different signature (output is time-series). Long-term this break needs to be closed and end up with one pipeline, but this current form will carry the functionality a long way. This will work until we add cross time-series arithmetic like:
({a} | rate()) /
({b} | rate())
return e.metricsPipeline.result(), nil | ||
} | ||
|
||
// SpanDeduper2 is EXTREMELY LAZY. It attempts to dedupe spans for metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be an area where we need to continue experimenting. For now I've chosen the approach that I think offers the best trade-offs. We can easily increase correctness at the cost of performance.
Currently it dedupes using a hash of trace ID and span start time. The downside is that it's obviously not 100% unique, but the performance is great because it doesn't require the query to fetch any additional columns, we already load them for backend queries (trace ID will used for sharding). Deduping on trace ID/span ID is the obviously 100% correct method and where we started, but the span ID column is too hefty and significantly reduces performance. A middle-ground option would be to use the nested set span ID (integer) which is also unique but lighter-weight than span ID. However the downsides are that it's only populated for complete traces, so we'd need some fallback for partial traces (back to using start time?), and there's no good way to access it directly. We decided not to expose it through the Fetch layer.
Additionally the data structure used is important too. This current implementation is a bunch of maps of 32-bit hashes. It has good performance but increased chance of collision vs 64-bit hashes (or storing traceID + spanID in the maps directly). We use the last byte of the trace ID to perform a single layer of 1-byte sharding which reduces the pressure on any single map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good start to me.
// has no chance for collisions (whereas a hash32 has a non-zero chance of | ||
// collisions). However it means we have to arbitrarily set an upper limit on | ||
// the maximum number of values. | ||
type FastValues [maxGroupBys]Static |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used for by (foo, bar)
, right? Is the order of Static
values in FastValues
relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, there is a direct match between the group bys and the values in the array. by(foo,bar)
will have values [ <f>, <b>, nil, nil, nil]
The surrounding GroupingAggregator has knowledge of both and combines them to create the full time-series at the end: { foo="<f>", bar="<b>"}
What this PR does:
This is the first step in supporting TraceQL metrics queries
{status=error} | rate() by (resource.service.name)
. It adds support to the language, parser, and engine for querying the two aggregatesrate()
andcount_over_time()
, and includes optional groupingby(a,b,c)
.Features specifically not included:
step
from the query. This meansrate()
works, but notrate(5m)
. This is vastly simpler and the best way to get started. The step interval is easily set in the Grafana UI.Some less-related updates:
select
statement to take a list of attributes/intrinsics and not generic FieldExpressions. Queries likeselect(1+.foo)
would parse and "run", but not return anything. I believe this was an oversight, so I fixed it by making the language parsing more strict. The main driver is that I wanted to reuse theattributeList
spec for bothselect(a,b,c)
andrate/count by(a,b,c)
.Notes
This is one entry in a set of chained PRs. The full body of work has been split into separate buckets to make reviews and updates more manageable.
Which issue(s) this PR fixes:
n/a
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]