Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search: drop use of TagCache, extract tags and tag values on-demand #1068

Merged
merged 30 commits into from
Dec 2, 2021

Conversation

yvrhdn
Copy link
Member

@yvrhdn yvrhdn commented Oct 21, 2021

What this PR does:
Instead of storing tags and tag values in the TagCache we can extract them directly from live traces, WAL and local blocks when requested. This should lower the amount of work spent on updating the tagcache during ingest.

Since tag values are extracted on-demand, we are not limited anymore by the size of the tag cache. So this makes it possible to return thousands of values when needed.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@yvrhdn
Copy link
Member Author

yvrhdn commented Oct 21, 2021

I've been doing a test on my local set up fetching tag values for load_generator.seq_num, this is a unique ID added to every trace by the synthetic-load-generator. Before these changes the amount of values always got truncated at about ~200 values. With this PR amount of values returned increases up to 80k and drops back to 20k when a block was flushed.

Avg latency sits around 200-400 ms.

@yvrhdn yvrhdn marked this pull request as ready for review October 21, 2021 17:17
@yvrhdn
Copy link
Member Author

yvrhdn commented Oct 22, 2021

TODO

  • Check if the context has been cancelled
    Screenshot 2021-10-22 at 13 45 05
  • Experiment with filtering tag values on service.name Next PR
  • Most time is spent searching live traces, can we speed this up?

@yvrhdn yvrhdn requested a review from mdisibio October 26, 2021 23:07
@yvrhdn yvrhdn added this to the v1.3 milestone Oct 29, 2021
@yvrhdn
Copy link
Member Author

yvrhdn commented Nov 8, 2021

Still to address: when fetching tag values with very high cardinality (for instance http.url) requests fail/time out when being sent between components. I.e. the ingester is fine digging up all the values, but the GRPC connection between the ingester, querier and query-frontend was hitting the max bytes size...
We should add some limits (optionally configurable) to protect against this happening. Ideally, we return early when we have 'a lot of' values.

Future improvements (not for this PR): we can add filter capabilities to the tag values endpoint, so if someone starts typing http.url=/myService/ we only return values containing /myService/. This would make high cardinality tags more useable.

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the TagCache struct?

@yvrhdn
Copy link
Member Author

yvrhdn commented Nov 8, 2021

Remove the TagCache struct?

Yep 👍, I've removed tempodb/search/tag_cache.go, it's all the way at the bottom of the diff
https://github.com/grafana/tempo/pull/1068/files#diff-b29b479cf004a8bf08fb55114702b94f6d818911fca0c83c985e771db2fbe4bb

@annanay25 annanay25 requested a review from joe-elliott December 1, 2021 13:40
Signed-off-by: Annanay <[email protected]>
Copy link
Contributor

@mdisibio mdisibio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting the limit check in. Good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants