Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce impact of backendRequests on latency #2530

Merged
merged 7 commits into from
Jun 2, 2023

Conversation

joe-elliott
Copy link
Member

@joe-elliott joe-elliott commented Jun 1, 2023

What this PR does:
Uses a channel to return jobs instead of a returning them as a slice from backendRequests. This nicely improves performance for queries that create a huge number of jobs.

Other Changes

  • Fixes a bug in searchProgress.internalShouldQuit() where we needed 1 more than the limit to quit
  • Switches totalBlockBytes to be a uint64 throughout

Benchmarks:

name                           old time/op    new time/op    delta
SearchSharderRoundTrip5-8         181ms ± 1%       1ms ± 9%  -99.54%  (p=0.008 n=5+5)
SearchSharderRoundTrip500-8       184ms ± 1%      12ms ± 1%  -93.27%  (p=0.008 n=5+5)
SearchSharderRoundTrip50000-8     318ms ± 4%     472ms ± 2%  +48.65%  (p=0.008 n=5+5)

name                           old alloc/op   new alloc/op   delta
SearchSharderRoundTrip5-8         118MB ± 0%       0MB ± 0%  -99.62%  (p=0.008 n=5+5)
SearchSharderRoundTrip500-8       120MB ± 0%       5MB ± 0%  -95.99%  (p=0.008 n=5+5)
SearchSharderRoundTrip50000-8     176MB ± 0%     176MB ± 0%   -0.10%  (p=0.008 n=5+5)

name                           old allocs/op  new allocs/op  delta
SearchSharderRoundTrip5-8         1.15M ± 0%     0.00M ± 2%  -99.88%  (p=0.008 n=5+5)
SearchSharderRoundTrip500-8       1.17M ± 0%     0.06M ± 0%  -95.12%  (p=0.008 n=5+5)
SearchSharderRoundTrip50000-8     2.24M ± 0%     2.26M ± 0%   +0.89%  (p=0.008 n=5+5)

Impact on exhaustive search with 100k jobs:
image

Which issue(s) this PR fixes:
Fixes #2469

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
reqs = append(reqs, subR)

select {
case reqCh <- &backendReqMsg{req: subR}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SearchSharderRoundTrip50000-8     318ms ± 4%     472ms ± 2%  +48.65% 

One thought on this is we could reduce the channel overhead by sending batched requests instead of individually. Looking at the code the easiest split is probably all jobs for a block in one channel send here.

Copy link
Member Author

@joe-elliott joe-elliott Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive attempt was worse. before = this PR, after = this PR with batching as suggested:

name                           old time/op    new time/op    delta
SearchSharderRoundTrip5-8         659µs ± 1%    1108µs ±83%   +68.15%  (p=0.016 n=4+5)
SearchSharderRoundTrip500-8      12.2ms ± 4%    12.9ms ± 6%      ~     (p=0.056 n=5+5)
SearchSharderRoundTrip50000-8     474ms ±11%     540ms ± 8%   +13.91%  (p=0.032 n=5+5)

name                           old alloc/op   new alloc/op   delta
SearchSharderRoundTrip5-8         451kB ± 0%     588kB ±54%   +30.19%  (p=0.008 n=5+5)
SearchSharderRoundTrip500-8      4.80MB ± 0%    4.96MB ± 0%    +3.43%  (p=0.008 n=5+5)
SearchSharderRoundTrip50000-8     176MB ± 0%     181MB ± 0%    +3.03%  (p=0.016 n=5+4)

name                           old allocs/op  new allocs/op  delta
SearchSharderRoundTrip5-8         1.33k ± 2%    2.98k ±133%  +123.52%  (p=0.008 n=5+5)
SearchSharderRoundTrip500-8       57.2k ± 0%     58.0k ± 0%    +1.24%  (p=0.008 n=5+5)
SearchSharderRoundTrip50000-8     2.26M ± 0%     2.28M ± 0%    +0.88%  (p=0.008 n=5+5)

I think the additional memory management offsets it. Personally, i'm not concerned about that +40%. Even in that case the overall performance is going to be significantly better b/c we're getting jobs to queriers faster.

That first benchmark SearchSharderRoundTrip5 is the most interesting b/c it roughly represents "time to first job" which is the real improvement here.

Copy link
Contributor

@zalegrala zalegrala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Nice changes. I think @mdisibio has an interesting idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Search Perf] Improve throughput of backendRequests
3 participants