Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace CombineTraceProtos with new Combiner #1291

Merged
merged 7 commits into from
Feb 24, 2022
Merged

Conversation

mdisibio
Copy link
Contributor

What this PR does:
This PR introduces a new Combiner which is similar to CombineTraceProtos but more efficient when combining more than two inputs. The previous pairwise usage of CombineTraceProtos had a couple inefficiencies when combining more than two inputs: (a) the intermediate result was sorted every time (b) the hash of span tokens was rebuilt every time. Combiner is stateful and improves this, which leads to significant reduction in cpu and memory. Performance is identical when combining just 2 inputs.

Additionally, this PR changes the span/token hashing to 64-bit to reduce the collision rate. Experimentally the collision rate of fnv32 approached 1 in 10,000 spans, which is significant because any collision results in a dropped span. 64-bit has no collisions up to the tested limit of a trace with 1M spans. Performance is still good.

Feedback
In order to maintain identical performance against 2 segments, Combiner must not save the span tokens for the second input, like how CombineTraceProtos did not. This can be generalized in that we never need to save the span tokens for the last input. These savings are significant enough to where it's worth accounting for, and there are many cases where we do know the length. I would like feedback on the chosen ergonomics/naming/style and see if there is a better pattern. For example:

// Unknown number of inputs
c := NewCombiner()
c.Consume(tr)
c.Consume(tr)
    ...
c.Result()

// Known number of inputs
c := NewCombiner()
for i, tr := range traces {
    c.ConsumeWithFinal(tr, i == len(traces)-1)
}
c.Result()

Benchmarks
This benchmark combines trace 2 to 8 segments of 100K spans each. Improvements are greater as more segments are combined.

BenchmarkCombine/2/CombineTraceProtos-12         	      27	  44842636 ns/op	 6316197 B/op	  115510 allocs/op
BenchmarkCombine/2/Combiner-12                   	      25	  44691228 ns/op	 6319784 B/op	  115615 allocs/op

BenchmarkCombine/3/CombineTraceProtos-12         	       9	 125415228 ns/op	17512674 B/op	  291023 allocs/op
BenchmarkCombine/3/Combiner-12                   	      15	  72986714 ns/op	11202546 B/op	  175560 allocs/op

BenchmarkCombine/4/CombineTraceProtos-12         	       5	 241490759 ns/op	35993108 B/op	  524033 allocs/op
BenchmarkCombine/4/Combiner-12                   	      10	 108902778 ns/op	18498868 B/op	  233082 allocs/op

BenchmarkCombine/8/CombineTraceProtos-12         	       1	1119084275 ns/op	164185168 B/op	 2061117 allocs/op
BenchmarkCombine/8/Combiner-12                   	       5	 242659052 ns/op	 37625163 B/op	  469882 allocs/op

Which issue(s) this PR fixes:
Should help #976

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@mdisibio
Copy link
Contributor Author

@tanner-bruce Would appreciate your feedback too as you've also done of a lot of investigation in this area.

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drooling over some of these benchmarks. We should definitely see some improvements in resource usage during both compaction and querying. A couple of baby nits and one real question.

Nice!

pkg/model/trace/combine_test.go Outdated Show resolved Hide resolved
pkg/model/trace/combine.go Outdated Show resolved Hide resolved
modules/frontend/tracebyidsharding.go Show resolved Hide resolved
pkg/model/trace/combine.go Outdated Show resolved Hide resolved
@mdisibio
Copy link
Contributor Author

Pushed an optimization to alloc the span map using the first input size, which saves a few more MB per call. Thanks @tanner-bruce !

BenchmarkCombine/2/Combiner-12         	      27	  41954908 ns/op	 4650348 B/op	  113443 allocs/op
BenchmarkCombine/3/Combiner-12         	      15	  72924713 ns/op	 9526314 B/op	  173208 allocs/op
BenchmarkCombine/4/Combiner-12         	      10	 105436694 ns/op	16823280 B/op	  230750 allocs/op
BenchmarkCombine/8/Combiner-12         	       5	 237391203 ns/op	35946148 B/op	  467689 allocs/op

@mdisibio mdisibio requested a review from joe-elliott February 24, 2022 12:45
Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for the help on this one @tanner-bruce!

@mdisibio mdisibio enabled auto-merge (squash) February 24, 2022 17:55
@mdisibio mdisibio merged commit 7d26ee2 into grafana:main Feb 24, 2022
@mdisibio mdisibio deleted the combiner branch April 25, 2023 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants