[dbnode] Optimize block range scan in queryWithSpan #3813

linasm · 2021-10-05T05:51:25Z

What this PR does / why we need it:
The client could query for an arbitrary wide time range (eg. 1000 years) which would cause an expensive iteration over all of it in the inner most loop. This change narrows down the scanned range of blocks to at most what is actually covered by the index entry.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:
NONE

Does this PR require updating code package or user-facing documentation?:
NONE

codecov · 2021-10-05T10:45:40Z

Codecov Report

Merging #3813 (e40889e) into master (e40889e) will not change coverage.
The diff coverage is n/a.

❗ Current head e40889e differs from pull request most recent head 4c5005f. Consider uploading reports for the commit 4c5005f to get more accurate results

@@          Coverage Diff           @@
##           master   #3813   +/-   ##
======================================
  Coverage    57.0%   57.0%           
======================================
  Files         552     552           
  Lines       63540   63540           
======================================
  Hits        36280   36280           
  Misses      24054   24054           
  Partials     3206    3206

Flag	Coverage Δ
aggregator	`63.3% <0.0%> (ø)`
cluster	`∅ <0.0%> (∅)`
collector	`58.4% <0.0%> (ø)`
dbnode	`60.7% <0.0%> (ø)`
m3em	`46.4% <0.0%> (ø)`
metrics	`19.7% <0.0%> (ø)`
msg	`74.4% <0.0%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e40889e...4c5005f. Read the comment docs.

ryanhall07

nice find!

src/m3ninx/doc/types.go

ryanhall07 · 2021-10-05T14:52:56Z

src/dbnode/storage/entry.go

@@ -387,6 +393,10 @@ func newEntryIndexState() entryIndexState {
 	}
 }

+func (s *entryIndexState) indexedRangeWithRLock() (xtime.UnixNano, xtime.UnixNano) {


nit: why the separate method instead of inlining in IndexedRange ?

IndexedRange is on Entry struct, indexedRangeWithRLock is on entryIndexState.

ryanhall07 · 2021-10-05T14:56:56Z

guess we have some work to do over the next 1000 years!

src/dbnode/storage/entry.go

nbroyles · 2021-10-21T18:19:10Z

src/dbnode/storage/index/block.go

+			if currentBlock.Before(minIndexed) {
+				currentBlock = minIndexed
+			}
+			maxIndexedExclusive := maxIndexed.Add(time.Nanosecond)


Why add the nanosecond?

To convert inclusive timestamp to exclusive one. So that we can compare and replace endExclusive with maxIndexedExclusive in the subsequent if.

nbroyles · 2021-10-21T18:34:00Z

src/dbnode/storage/index/block.go

+				endExclusive = maxIndexedExclusive
+			}
+
+			for !inBlock && currentBlock.Before(endExclusive) {


So, I don't quite follow this for loop. queryWithSpan is called on a block and takes a queryIter which iterates over index results within the same block. Given that, why do we need to do this loop over every block in the query range to check and see if the doc is indexed? Don't we only need to check that the block itself is within the query range and that doc is indexed for this same block (which it should be since it's in the queryIter)? I'm not super familiar with the intricacies of the read path, so could be misunderstanding here.

TBH I'm not really familiar with this aspect, either. I saw the opportunity to optimize this code without affecting its semantics, in which case I don't have to fully understand the context of it.
I think the best bet is to ask @robskillington who wrote the original code to shed some light on the purpose of this loop.

Hm, this loop behaves slightly different than before. Previously, this loop was being iterated no less than once, now it might not iterate at all. While the current implementation is more correct, maybe the less correct version was necessary for proper functioning? (Though conditions when the behavior would differ seem to be too rare for this to matter).

You mean the case when start == endInclusive? I think this was a bug, and it would have been difficult to replicate with the optimized version, so I chose to fix it. I believe this is not a realistic edge case, also this is certainly not the issue that we could have seen.

linasm added 11 commits October 5, 2021 08:47

[dbnode] Optimize block range scan in queryWithSpan

2327419

TestEntryIndexedRange

0a7db36

Fix tests

f341b43

TestBlockE2EInsertAddResultsQueryNarrowingBlockRange

60f500d

lint

66a94fb

fmt

15dfb50

Fix TestNamespaceForwardIndexInsertQuery

7c4a702

Fix TestNamespaceIndexHighConcurrentQueries*

399e276

Fix TestIndexMultipleBlockQuery

0cf97a6

Fix edge case

e2590eb

Fix TestNamespaceIndexInsertQuery

79c1ef9

linasm changed the title ~~WIP [dbnode] Optimize block range scan in queryWithSpan~~ [dbnode] Optimize block range scan in queryWithSpan Oct 5, 2021

linasm marked this pull request as ready for review October 5, 2021 10:46

linasm requested review from robskillington, ryanhall07 and rallen090 October 5, 2021 10:47

ryanhall07 approved these changes Oct 5, 2021

View reviewed changes

rallen090 reviewed Oct 5, 2021

View reviewed changes

src/dbnode/storage/entry.go Outdated Show resolved Hide resolved

rallen090 approved these changes Oct 5, 2021

View reviewed changes

linasm and others added 2 commits October 6, 2021 08:21

Merge branch 'master' into linasm/optimize-queryWithSpan-block-scan

0a65bd3

Address PR feedback

4c5005f

linasm enabled auto-merge (squash) October 6, 2021 05:29

linasm merged commit e0a3682 into master Oct 6, 2021

linasm deleted the linasm/optimize-queryWithSpan-block-scan branch October 6, 2021 05:45

nbroyles reviewed Oct 21, 2021

View reviewed changes

linasm mentioned this pull request Oct 29, 2021

sudden increase in CPU across all nodes in a cluster causing query failure #3878

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dbnode] Optimize block range scan in queryWithSpan #3813

[dbnode] Optimize block range scan in queryWithSpan #3813

linasm commented Oct 5, 2021 •

edited

Loading

codecov bot commented Oct 5, 2021 •

edited

Loading

ryanhall07 left a comment

ryanhall07 Oct 5, 2021

linasm Oct 6, 2021

ryanhall07 commented Oct 5, 2021

nbroyles Oct 21, 2021

linasm Oct 21, 2021

nbroyles Oct 21, 2021 •

edited

Loading

linasm Oct 21, 2021

vpranckaitis Oct 22, 2021 •

edited

Loading

linasm Oct 22, 2021

[dbnode] Optimize block range scan in queryWithSpan #3813

[dbnode] Optimize block range scan in queryWithSpan #3813

Conversation

linasm commented Oct 5, 2021 • edited Loading

codecov bot commented Oct 5, 2021 • edited Loading

Codecov Report

ryanhall07 left a comment

Choose a reason for hiding this comment

ryanhall07 Oct 5, 2021

Choose a reason for hiding this comment

linasm Oct 6, 2021

Choose a reason for hiding this comment

ryanhall07 commented Oct 5, 2021

nbroyles Oct 21, 2021

Choose a reason for hiding this comment

linasm Oct 21, 2021

Choose a reason for hiding this comment

nbroyles Oct 21, 2021 • edited Loading

Choose a reason for hiding this comment

linasm Oct 21, 2021

Choose a reason for hiding this comment

vpranckaitis Oct 22, 2021 • edited Loading

Choose a reason for hiding this comment

linasm Oct 22, 2021

Choose a reason for hiding this comment

linasm commented Oct 5, 2021 •

edited

Loading

codecov bot commented Oct 5, 2021 •

edited

Loading

nbroyles Oct 21, 2021 •

edited

Loading

vpranckaitis Oct 22, 2021 •

edited

Loading