-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce impact of backendRequests on latency #2530
Conversation
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
reqs = append(reqs, subR) | ||
|
||
select { | ||
case reqCh <- &backendReqMsg{req: subR}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SearchSharderRoundTrip50000-8 318ms ± 4% 472ms ± 2% +48.65%
One thought on this is we could reduce the channel overhead by sending batched requests instead of individually. Looking at the code the easiest split is probably all jobs for a block in one channel send here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive attempt was worse. before = this PR, after = this PR with batching as suggested:
name old time/op new time/op delta
SearchSharderRoundTrip5-8 659µs ± 1% 1108µs ±83% +68.15% (p=0.016 n=4+5)
SearchSharderRoundTrip500-8 12.2ms ± 4% 12.9ms ± 6% ~ (p=0.056 n=5+5)
SearchSharderRoundTrip50000-8 474ms ±11% 540ms ± 8% +13.91% (p=0.032 n=5+5)
name old alloc/op new alloc/op delta
SearchSharderRoundTrip5-8 451kB ± 0% 588kB ±54% +30.19% (p=0.008 n=5+5)
SearchSharderRoundTrip500-8 4.80MB ± 0% 4.96MB ± 0% +3.43% (p=0.008 n=5+5)
SearchSharderRoundTrip50000-8 176MB ± 0% 181MB ± 0% +3.03% (p=0.016 n=5+4)
name old allocs/op new allocs/op delta
SearchSharderRoundTrip5-8 1.33k ± 2% 2.98k ±133% +123.52% (p=0.008 n=5+5)
SearchSharderRoundTrip500-8 57.2k ± 0% 58.0k ± 0% +1.24% (p=0.008 n=5+5)
SearchSharderRoundTrip50000-8 2.26M ± 0% 2.28M ± 0% +0.88% (p=0.008 n=5+5)
I think the additional memory management offsets it. Personally, i'm not concerned about that +40%. Even in that case the overall performance is going to be significantly better b/c we're getting jobs to queriers faster.
That first benchmark SearchSharderRoundTrip5
is the most interesting b/c it roughly represents "time to first job" which is the real improvement here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Nice changes. I think @mdisibio has an interesting idea.
What this PR does:
Uses a channel to return jobs instead of a returning them as a slice from backendRequests. This nicely improves performance for queries that create a huge number of jobs.
Other Changes
searchProgress.internalShouldQuit()
where we needed 1 more than the limit to quittotalBlockBytes
to be auint64
throughoutBenchmarks:
Impact on exhaustive search with 100k jobs:
Which issue(s) this PR fixes:
Fixes #2469
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]