-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore Traces Homepage #279
base: main
Are you sure you want to change the base?
Conversation
I think we will need the topk API for getting the top erroring services @joe-elliott @mdisibio (something that we add frontend support for in the next iteration).
Service name probably makes more sense here.
Yes I can update this. |
The results aren't consistent because these are search queries that are returning traces. An easy way to make these results consistent is to change it to an instant metric query, like this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition to the app! Left a few comments in the code on some small things I noticed.
As Alex mentioned, the errors panel seems to react differently to the loading state than the duration panel, which seems to disappear.
As mentioned in another comment we're using search queries here and I'm not sure if that's the best option. Let's consider using metric queries to see what feels best before we make a final decision on what to ship.
For a query like I wonder would the best move be to group by |
Grouping by trace ID or span ID isn't ideal, it will increase the load on Tempo, make the return message huge and as you noticed won't be the easiest to work with on the app.
Since we're running a metrics query we're working on top of aggregated data, so it doesn't make sense to group by ID IMO, it defeats the purpose of the data aggregation in this case. The best we can use are the exemplars if we want examples of trace IDs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking better! Still have a few more comments before approving.
if (type === 'errors') { | ||
const getLabel = (df: DataFrame) => { | ||
const valuesField = df.fields.find((f) => f.name !== 'time'); | ||
return valuesField?.labels?.['resource.service.name'].slice(1, -1) /* remove quotes */ ?? 'Service name not found'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.slice(1,-1)
is too brittle, if we remove the quotes somewhere else this will eat away the service name. replace('"', "")
is safer.
let yBuckets = data.data?.series.map((s) => parseFloat(s.fields[1].name)).sort((a, b) => a - b); | ||
if (yBuckets?.length) { | ||
const slowestBuckets = Math.floor(yBuckets.length / 4); | ||
let minBucket = yBuckets.length - slowestBuckets - 1; | ||
if (minBucket < 0) { | ||
minBucket = 0; | ||
} | ||
|
||
const minDuration = yBucketToDuration(minBucket - 1, yBuckets); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied code right? Could likely be refactored to a function that could be reused to find the slowest threshold.
children: [ | ||
new AttributePanel({ | ||
query: { | ||
query: `{nestedSetParent<0 && kind=server && duration > ${minDuration}} | by (resource.service.name)`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| by(resource.service.name)
isn't doing anything since this is a search query and not a metrics query.
Also, this will suffer of the same issues as the errors table had, the results won't be consistent across refreshes since it's a search query.
In this PR we have added a home page for Explore Traces! 🥳 🚀
The homepage allows you to get a quick overview of your services without the need to add filters or select your desired RED metric.
Through this view you can easily see your services with errors and also your slowest services: two key areas that we constantly strive to monitor.
Keep in mind this is only a v1 for the homepage, there is much more planned! Thanks @nadinevehling for all your work!