Sort traces on flush to ensure consistent payloads in the backend #606
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does:
We have noticed that with RF3, the compactors seem to be combining more traces than expected, i.e. all ingesters should be receiving identical payloads and flushing identical bytes to the backend, and other than for long-running traces, the compactors should not have to recombine much. Researching shows that the assumption that ingesters are flushing identical bytes is incorrect. It was observed that the ingesters all flushed the same data in total for a trace, but internally the batches were not in the same order. This leads to differing bytes and to be recombined by the compactor.
This PR internally sorts the traces as they are flushed by the ingesters. The sort order doesn't really matter, as long as it is consistent. Right now it sorts bottom up by span start time, then span id.
Also considered possible causes for the different batch order: It is possible that
ring.DoBatch
in the distributor is not issuing the batches for a given trace in the same order to each ingester. Tried sortingindexes
to fix but it did not solve the issue.Which issue(s) this PR fixes:
n/a
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]