Generator performance #4232

mdisibio · 2024-10-24T15:43:11Z

What this PR does:
Metrics generator span metrics and service graphs are very popular, but memory usage can be quite high for high-cardinality setups, for example 1+ million active series in a single pod. I noticed some areas for improvement to reduce memory.

This PR contains 2 updates:

(1) on the surface it updates histograms to pre-compute all prometheus labels during creation, instead of during collection time. This way we alloc the labels once instead of on every scrape. This is possible because all of the labels for a specific histogram bucket are fixed, even external labels which are configured via runtime config.

(2) Almost more importantly in mind, it's adding a suite of benchmarks for the generator, which include WAL and non-mock registry. These benchmarks will help us identify more improvements like (1). Skimming through the module I've left several TODOs large and small of next areas to update.

Benchmarks show a large reduction in mem in Collect. Tested this in an internal cluster and it was ~15% total working set savings.

          │  before.txt  │              after.txt              │
          │    sec/op    │    sec/op     vs base               │
PushSpans   1.421m ±  1%   1.441m ±  1%   +1.36% (p=0.002 n=6)
Collect     31.30µ ± 13%   10.34µ ± 14%  -66.96% (p=0.002 n=6)
geomean     210.9µ         122.1µ        -42.13%

          │ before.txt  │             after.txt             │
          │ heap_in_use │ heap_in_use  vs base              │
PushSpans   15.69M ± 1%   15.50M ± 0%  -1.20% (p=0.002 n=6)
Collect     15.68M ± 0%   15.49M ± 0%  -1.20% (p=0.002 n=6)
geomean     15.68M        15.50M       -1.20%

          │  before.txt   │              after.txt              │
          │     B/op      │     B/op      vs base               │
PushSpans   705.8Ki ±  0%   705.9Ki ± 0%        ~ (p=0.310 n=6)
Collect     31.27Ki ± 11%   14.47Ki ± 4%  -53.73% (p=0.002 n=6)
geomean     148.6Ki         101.1Ki       -31.98%

          │  before.txt  │              after.txt              │
          │  allocs/op   │  allocs/op    vs base               │
PushSpans   18.58k ±  0%   18.58k ±  0%        ~ (p=1.000 n=6)
Collect      71.00 ± 11%    21.00 ± 14%  -70.42% (p=0.002 n=6)
geomean     1.149k          624.7        -45.61%

Which issue(s) this PR fixes:
Fixes #

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

…for all sub-series instead of during each collection

zalegrala

Looks good. I appreciate the TODOs also.

zalegrala · 2024-10-24T20:13:43Z

modules/generator/generator_test.go

@@ -129,3 +137,140 @@ func (l testLogger) Log(keyvals ...interface{}) error {
 	l.t.Log(keyvals...)
 	return nil
 }
+
+func BenchmarkPushSpans(b *testing.B) {


I love the full stack benchmark. This is something we might want to do for other components as well.

zalegrala · 2024-10-24T20:15:52Z

modules/generator/generator_test.go

+	}
+
+	b.StopTimer()
+	runtime.GC()


Why force a GC after the benchmark has been timed? Is this to avoid impact on later benchmarks, or to get accurate memory summary below?

In this case I was trying to see if the benchmarks could help measure inuse memory, so it is recording HeapInUse at the bottom. Without the GC it just keeps growing in between runs of the benchmark (-count=5 for example). So this was an attempt to make the inuse metric more useful.

* todos * more todos and print inuse stats * Benchmark report heapinuse, ensure cleanup between benchmarks * Improve memory usage by changing histograms to precompute all labels for all sub-series instead of during each collection * changelog

mdisibio added 6 commits September 27, 2024 15:42

todos

11f1f4a

more todos and print inuse stats

dc9a05b

Merge branch 'main' into generator-performance

a55131f

Benchmark report heapinuse, ensure cleanup between benchmarks

1e654b4

Improve memory usage by changing histograms to precompute all labels …

1a761d7

…for all sub-series instead of during each collection

changelog

4f6115d

mdisibio marked this pull request as ready for review October 24, 2024 19:24

mdisibio requested review from joe-elliott, mapno, yvrhdn, zalegrala, electron0zero, ie-pham, stoewer and javiermolinar as code owners October 24, 2024 19:25

zalegrala approved these changes Oct 24, 2024

View reviewed changes

Merge branch 'main' into generator-performance

6687d1a

mdisibio merged commit 2f2a35e into grafana:main Oct 25, 2024
16 checks passed

zalegrala mentioned this pull request Oct 29, 2024

Native histogram performance #4244

Merged

3 tasks

This was referenced Nov 1, 2024

Registry performance #4261

Merged

Further reduce Labes() calls in the metrics registry #4283

Merged

electron0zero mentioned this pull request Nov 28, 2024

max query expr electron0zero/tempo#3

Closed

electron0zero mentioned this pull request Jan 29, 2025

contri svs electron0zero/tempo#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generator performance #4232

Generator performance #4232

mdisibio commented Oct 24, 2024 •

edited

Loading

zalegrala left a comment

zalegrala Oct 24, 2024

zalegrala Oct 24, 2024

mdisibio Oct 25, 2024

Generator performance #4232

Generator performance #4232

Conversation

mdisibio commented Oct 24, 2024 • edited Loading

zalegrala left a comment

Choose a reason for hiding this comment

zalegrala Oct 24, 2024

Choose a reason for hiding this comment

zalegrala Oct 24, 2024

Choose a reason for hiding this comment

mdisibio Oct 25, 2024

Choose a reason for hiding this comment

mdisibio commented Oct 24, 2024 •

edited

Loading