Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TraceQL query hint to retrieve most recent results ordered by trace start time #4238

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

joe-elliott
Copy link
Member

@joe-elliott joe-elliott commented Oct 25, 2024

What this PR does

Adds a query hint with (most_recent=true) that causes Tempo to perform a more intense search to return the most recent traces ordered by start time. This is currently implemented as a query hint b/c (depending on the circumstances) it can have a quite large impact on time to return results in both streaming grpc and discrete http paths. A TraceQL hint was chosen over a query param b/c it is desired longer term that this is the default behavior and it will be easier to remove a query hint then a query parameter.

This change is implemented at the query frontend, query engine and ingester search levels. In all cases it is required to store the most recent results and continue searching instead of immediately returning once a limit is hit.

Other Changes

Which issue(s) this PR fixes:
This PR will give a way to always return the most recent results ordered by start time which partially addresses #3777, #3109 and #2659.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Copy link
Contributor

@knylander-grafana knylander-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the docs! Updates look good.

}
i.headBlockMtx.RUnlock()
if err := anyErr.Load(); err != nil {
return nil, err
}
if combiner.Count() >= maxResults {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One benefit of this early return was that it didn't have to take the blocks mutex if the search was satisfied by the head block. Is that still possible if the query isn't using the mostrecent hint?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, that check has been moved here. for the anyCombiner the time is ignored and it's just the old limit check.

modules/frontend/combiner/search.go Outdated Show resolved Hide resolved
modules/frontend/search_sharder.go Outdated Show resolved Hide resolved
pkg/traceql/combine.go Outdated Show resolved Hide resolved

if c.Count() == c.keepMostRecent && c.keepMostRecent > 0 {
// if this is older than the oldest element, bail
if c.OldestTimestampNanos() > new.StartTimeUnixNano {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that this function is correct, but I need help understanding a bit more. Wondering if it should check the new timestamp cutoff before merging with existing? Or generally: how the timestamp/sorting works when merging. If we get an older partial snippet of an existing entry, it will overwrite the start time to be older. In that case should the result be kicked out? If the trace had been fully combined into one snippet (perfect compaction), it seems like it would have been kicked out (properly counted as older than other results).

Copy link
Member Author

@joe-elliott joe-elliott Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, these are good callouts. i wrote this logic thinking about the combiner being used in the engine, but the truth is there is all kinds of weird behavior with fractured traces when used in the query frontend. i don't know if there are good answers for all these edge cases without reconsiderations at the storage layer. this may be a feature that is always best effort.

This is weird:

  • Add metadata for trace id 1234 to the most recent list
  • It gets pushed out by more recent traces
  • Receive a new shard for 1234 that re-qualifies it for the most recent list
  • We correctly return 1234 but have lost some information about it

Or this:

  • Receive 10 "most recent" traces and return a result to the client
  • There is a trace shard earlier in the search window which pushes one of the returned results out of the most recent list.
  • We have incorrectly returned a trace that should not have been in the list.

modules/frontend/search_sharder.go Outdated Show resolved Hide resolved
@joe-elliott joe-elliott requested a review from mdisibio November 7, 2024 20:31
Copy link
Contributor

github-actions bot commented Jan 7, 2025

This PR has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. This pull request will be closed in 15 days if there is no new activity.
Please apply keepalive label to exempt this Pull Request.

@github-actions github-actions bot added the stale Used for stale issues / PRs label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Used for stale issues / PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants