Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dbnode] Optimize block range scan in queryWithSpan #3813

Merged
merged 13 commits into from
Oct 6, 2021

Conversation

linasm
Copy link
Collaborator

@linasm linasm commented Oct 5, 2021

What this PR does / why we need it:
The client could query for an arbitrary wide time range (eg. 1000 years) which would cause an expensive iteration over all of it in the inner most loop. This change narrows down the scanned range of blocks to at most what is actually covered by the index entry.

Special notes for your reviewer:
Screenshot 2021-10-04 at 17 24 09

Does this PR introduce a user-facing and/or backwards incompatible change?:
NONE

Does this PR require updating code package or user-facing documentation?:
NONE

@codecov
Copy link

codecov bot commented Oct 5, 2021

Codecov Report

Merging #3813 (e40889e) into master (e40889e) will not change coverage.
The diff coverage is n/a.

❗ Current head e40889e differs from pull request most recent head 4c5005f. Consider uploading reports for the commit 4c5005f to get more accurate results

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #3813   +/-   ##
======================================
  Coverage    57.0%   57.0%           
======================================
  Files         552     552           
  Lines       63540   63540           
======================================
  Hits        36280   36280           
  Misses      24054   24054           
  Partials     3206    3206           
Flag Coverage Δ
aggregator 63.3% <0.0%> (ø)
cluster ∅ <0.0%> (∅)
collector 58.4% <0.0%> (ø)
dbnode 60.7% <0.0%> (ø)
m3em 46.4% <0.0%> (ø)
metrics 19.7% <0.0%> (ø)
msg 74.4% <0.0%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e40889e...4c5005f. Read the comment docs.

@linasm linasm changed the title WIP [dbnode] Optimize block range scan in queryWithSpan [dbnode] Optimize block range scan in queryWithSpan Oct 5, 2021
@linasm linasm marked this pull request as ready for review October 5, 2021 10:46
Copy link
Collaborator

@ryanhall07 ryanhall07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find!

src/m3ninx/doc/types.go Show resolved Hide resolved
@@ -387,6 +393,10 @@ func newEntryIndexState() entryIndexState {
}
}

func (s *entryIndexState) indexedRangeWithRLock() (xtime.UnixNano, xtime.UnixNano) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why the separate method instead of inlining in IndexedRange ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexedRange is on Entry struct, indexedRangeWithRLock is on entryIndexState.

@ryanhall07
Copy link
Collaborator

guess we have some work to do over the next 1000 years!

@linasm linasm enabled auto-merge (squash) October 6, 2021 05:29
@linasm linasm merged commit e0a3682 into master Oct 6, 2021
@linasm linasm deleted the linasm/optimize-queryWithSpan-block-scan branch October 6, 2021 05:45
if currentBlock.Before(minIndexed) {
currentBlock = minIndexed
}
maxIndexedExclusive := maxIndexed.Add(time.Nanosecond)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add the nanosecond?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To convert inclusive timestamp to exclusive one. So that we can compare and replace endExclusive with maxIndexedExclusive in the subsequent if.

endExclusive = maxIndexedExclusive
}

for !inBlock && currentBlock.Before(endExclusive) {
Copy link
Collaborator

@nbroyles nbroyles Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I don't quite follow this for loop. queryWithSpan is called on a block and takes a queryIter which iterates over index results within the same block. Given that, why do we need to do this loop over every block in the query range to check and see if the doc is indexed? Don't we only need to check that the block itself is within the query range and that doc is indexed for this same block (which it should be since it's in the queryIter)? I'm not super familiar with the intricacies of the read path, so could be misunderstanding here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I'm not really familiar with this aspect, either. I saw the opportunity to optimize this code without affecting its semantics, in which case I don't have to fully understand the context of it.
I think the best bet is to ask @robskillington who wrote the original code to shed some light on the purpose of this loop.

Copy link
Collaborator

@vpranckaitis vpranckaitis Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this loop behaves slightly different than before. Previously, this loop was being iterated no less than once, now it might not iterate at all. While the current implementation is more correct, maybe the less correct version was necessary for proper functioning? (Though conditions when the behavior would differ seem to be too rare for this to matter).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the case when start == endInclusive? I think this was a bug, and it would have been difficult to replicate with the optimized version, so I chose to fix it. I believe this is not a realistic edge case, also this is certainly not the issue that we could have seen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants