Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort traces on flush to ensure consistent payloads in the backend #606

Merged
merged 9 commits into from
Mar 23, 2021

Conversation

mdisibio
Copy link
Contributor

@mdisibio mdisibio commented Mar 23, 2021

What this PR does:
We have noticed that with RF3, the compactors seem to be combining more traces than expected, i.e. all ingesters should be receiving identical payloads and flushing identical bytes to the backend, and other than for long-running traces, the compactors should not have to recombine much. Researching shows that the assumption that ingesters are flushing identical bytes is incorrect. It was observed that the ingesters all flushed the same data in total for a trace, but internally the batches were not in the same order. This leads to differing bytes and to be recombined by the compactor.

This PR internally sorts the traces as they are flushed by the ingesters. The sort order doesn't really matter, as long as it is consistent. Right now it sorts bottom up by span start time, then span id.

Also considered possible causes for the different batch order: It is possible that ring.DoBatch in the distributor is not issuing the batches for a given trace in the same order to each ingester. Tried sorting indexes to fix but it did not solve the issue.

Which issue(s) this PR fixes:
n/a

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@mdisibio mdisibio marked this pull request as draft March 23, 2021 13:08
@mdisibio mdisibio marked this pull request as ready for review March 23, 2021 14:38
@joe-elliott joe-elliott merged commit 3c8dc30 into grafana:master Mar 23, 2021
@mdisibio mdisibio deleted the objects-combined-fix branch May 27, 2021 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants