Frontend batching #2677

joe-elliott · 2023-07-19T20:33:50Z

What this PR does:
Batches jobs in the requests from the query-frontend queue to the queriers. Previously, the frontend would send each job 1 at at time with an individual http request. This PR adds a configurable parameter to allow the frontend to send more than one request at once.

Other changes:

Docs of course! Including an update to the search performance tuning doc with some more current information.
Adds a new histogram metric tempo_query_frontend_actual_batch_size to track the actual size of the batches being farmed to the queriers
Better testing of the queues and frontend worker.
Added the ability for the querier to signal to the frontend the features it supports for seamless rollouts.

Performance testing
The goal with the setup was to create a cluster that could execute the 36k jobs created by the test query simultaneously. This way job throughput from frontend -> querier could be tested more directly.

80 queriers
500 jobs per querier
Total cluster capacity 40k jobs
No reliance on serverless

Results

batch size    overall query latency     p99 job time in queue
1             8.5s                      4.9s
2             7.6s                      2.4s
5             6.7s                      1s
10            9s                        4.4s
1*            9.6s                      9s

*current image

The overall latency of queries where total jobs > total cluster capacity was not as impressively reduced, but this is a good step in the right direction.

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <[email protected]>

mdisibio

This looks great and I like the way it is controlled by querier features. A few small q's, but none blocking and will go ahead and approve.

modules/frontend/queue/queue.go

modules/frontend/v1/request_batch.go

modules/querier/worker/frontend_processor.go

modules/frontend/v1/frontend.go

Co-authored-by: Martin Disibio <[email protected]>

zalegrala

This looks pretty good to me, nice improvement. Will be interesting to see the results on the dashboard. Had a question about the context handling but not blocking.

zalegrala · 2023-07-20T17:49:29Z

modules/frontend/v1/frontend.go

-		// then error out this upstream request _and_ stream.
-		case err := <-errs:
-			req.err <- err
+		err = reportResponseUpstream(reqBatch, errs, resps)


Do we have a context to pass? Wondering if it might simplify the context handling below.

If the streaming GRPC server connection itself drops or context is cancelled then the .Send() returns an error and this case is hit:
https://github.com/grafana/tempo/pull/2677/files#diff-0914703aed52090bd72851004df203444207d9d48677c10860b0459afef1a0b9R311

If the request is cancelled upstream then this case is hit:
https://github.com/grafana/tempo/pull/2677/files#diff-0914703aed52090bd72851004df203444207d9d48677c10860b0459afef1a0b9R304

If the requests are cancelled downstream then we get an http response and this case is hit:
https://github.com/grafana/tempo/pull/2677/files#diff-0914703aed52090bd72851004df203444207d9d48677c10860b0459afef1a0b9R304

I think everything is covered.

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott added 15 commits July 17, 2023 09:13

move queue

75d6cae

Signed-off-by: Joe Elliott <[email protected]>

remove unnecessary loop

55f9a79

Signed-off-by: Joe Elliott <[email protected]>

added mimir fix

6daf93d

Signed-off-by: Joe Elliott <[email protected]>

add batching ability

a82b122

Signed-off-by: Joe Elliott <[email protected]>

RLock() shenanigans

5214fce

Signed-off-by: Joe Elliott <[email protected]>

fix shenanigans

885836d

Signed-off-by: Joe Elliott <[email protected]>

make batch size configurable

dc5a2a9

Signed-off-by: Joe Elliott <[email protected]>

remove unused param

bc52009

Signed-off-by: Joe Elliott <[email protected]>

comments/cleanup

8d6735d

Signed-off-by: Joe Elliott <[email protected]>

added request buffer

0cb8d51

Signed-off-by: Joe Elliott <[email protected]>

request batch tests

f79734a

Signed-off-by: Joe Elliott <[email protected]>

todos

71484c5

Signed-off-by: Joe Elliott <[email protected]>

todos + testing

f3a1350

Signed-off-by: Joe Elliott <[email protected]>

docs updates

58fbdb0

Signed-off-by: Joe Elliott <[email protected]>

Merge branch 'main' into frontend-batching

a1923e2

joe-elliott requested review from knylander-grafana, annanay25, mdisibio, mapno, yvrhdn, zalegrala, electron0zero, ie-pham and stoewer as code owners July 19, 2023 20:33

joe-elliott added 3 commits July 19, 2023 16:34

todos

528b941

Signed-off-by: Joe Elliott <[email protected]>

lint

66efe0a

Signed-off-by: Joe Elliott <[email protected]>

changelog

cb84276

Signed-off-by: Joe Elliott <[email protected]>

mdisibio approved these changes Jul 20, 2023

View reviewed changes

Update modules/frontend/queue/queue.go

3dc42ac

Co-authored-by: Martin Disibio <[email protected]>

zalegrala approved these changes Jul 20, 2023

View reviewed changes

joe-elliott added 3 commits July 20, 2023 14:30

review comments

2797431

Signed-off-by: Joe Elliott <[email protected]>

use constant. wuss out on default batch size

e72cd2d

Signed-off-by: Joe Elliott <[email protected]>

lint

839c347

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott merged commit 306c6ca into grafana:main Jul 20, 2023

joe-elliott mentioned this pull request Jul 25, 2023

[Search Perf] Improve Query Frontend -> Querier Job Throughput #2464

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frontend batching #2677

Frontend batching #2677

joe-elliott commented Jul 19, 2023 •

edited

Loading

mdisibio left a comment

zalegrala left a comment

zalegrala Jul 20, 2023

joe-elliott Jul 20, 2023

Frontend batching #2677

Frontend batching #2677

Conversation

joe-elliott commented Jul 19, 2023 • edited Loading

mdisibio left a comment

Choose a reason for hiding this comment

zalegrala left a comment

Choose a reason for hiding this comment

zalegrala Jul 20, 2023

Choose a reason for hiding this comment

joe-elliott Jul 20, 2023

Choose a reason for hiding this comment

joe-elliott commented Jul 19, 2023 •

edited

Loading