Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: store/cachekv: reduce growth factor for iterator ranging using binary searches (backport #10024) #10370

Merged
merged 3 commits into from
Oct 15, 2021

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Oct 14, 2021

This is an automatic backport of pull request #10024 done by Mergify.
Cherry-pick of 3c85944 has failed:

On branch mergify/bp/release/v0.44.x/pr-10024
Your branch is up to date with 'origin/release/v0.44.x'.

You are currently cherry-picking commit 3c8594406.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   store/cachekv/search_test.go
	modified:   store/cachekv/store.go

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   CHANGELOG.md

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/github/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally


Mergify commands and options

More conditions and actions can be found in the documentation.

You can also trigger Mergify actions by commenting on this pull request:

  • @Mergifyio refresh will re-evaluate the rules
  • @Mergifyio rebase will rebase this PR on its base branch
  • @Mergifyio update will merge the base branch into this PR
  • @Mergifyio backport <destination> will backport this PR on <destination> branch

Additionally, on Mergify dashboard you can:

  • look at your merge queues
  • generate the Mergify configuration with the config editor.

Finally, you can contact us on https://mergify.io/

…binary searches (#10024)

This change takes the observation that previous dbm.IsKeyInDomain
which searches for [start, end) was performing too many byteslice
comparisons. Instead we start off by sorting all the values in the
store.unsortedCache, and then apply a modified binary search to
look for values that fall within the domain [start, end)
The procedure involves:
* iterating over all items to build a list of all keys -- O(n)
* invoking sort.Strings immediately, of which
we anyways eventually invoke sort.Slice(unsorted, ...) which uses
Quicksort -- O(nlog(n)) or O(n^2) worst case
* invoking modified binary search which is O(log(n)) * 2 ~ O(log(n))
to search for the [start, end) range indices

for a total approximate complexity of:
Best case:  O(n) + O(n(log(n))) + O(log(n)) ~= O(nlog(n))
Worst case: O(n) + O(n^2) + O(log(n))       ~= O(n^2)

instead of previously:
* iterating over all the unsorted items and invoking dbm.IsKeyInDomain:
bytes.Compare ~ O(n) + O(n*s*e) where s -- len(start), e -- len(end)
for overall complexity of O(n*s*e)
* invoking sort.Slice(unsorted, ...) which uses
Quicksort -- O(nlog(n)) or O(n^2) worst case

for a total approximate complexity of:
Best case:  O(n) + O(n*s*e) + O(nlog(n)) ~= O(n*s*e) ~ O(n^2)
Worst case: O(n) + O(n*s*e) + O(n^2)     ~= O(n*s*e) ~ O(n^2)

Ordinarily we'd combine the n*s*e to be n*m, but really the comparisons
between (start & key, end & key) are profound that it makes sense to
keep them as factors. The overall benchmark results vindicate our choice
of isolating the factors (n*s*e)

The benchmarks show that as the number of keys to iterate grows, the
new code grows gracefully in a somewhat linear growth, notice for
CAcheKVStoreIterator*, when we go from:
* 1,000 to 10,000 keys: 120us->1,600us (13X) old vs 95us->900us (9.47X) new
* 50,000 to 100,000 keys: 19ms->100ms (5.3X) old vs 5.5ms->17ms (3X) new

```shell
time/op
GetValidator-8	              5.8ms ± 2%    4.7ms ± 1%	-17.69%	(p=0.000 n=10+10)
OneBankSendTxPerBlock-8	      3.2ms ± 2%    2.8ms ± 1%	-10.80%	(p=0.000 n=7+10)
OneBankMultiSendTxPerBlock-8  3.1ms ± 3%    2.9ms ± 2%	-8.36%	(p=0.000 n=10+10)
AccountMapperSetAccount-8     8.6µs ± 1%    7.8µs ± 1%	-9.74%	(p=0.000 n=10+10)
CacheKVStoreIterator500-8     64µs ± 6%	    51µs ± 6%	-19.22%	(p=0.000 n=10+9)
CacheKVStoreIterator1000-8    0.12ms ± 4%   95µs ± 4%	-19.55%	(p=0.000 n=10+10)
CacheKVStoreIterator10000-8   1.6ms ± 4%    0.90ms ± 1%	-42.11%	(p=0.000 n=10+10)
CacheKVStoreIterator50000-8   19ms ± 5%	    5.5ms ± 1%	-71.35%	(p=0.000 n=10+10)
CacheKVStoreIterator100000-8  0.10s ± 23%   17ms ± 7%	-83.44%	(p=0.000 n=10+10)
CacheKVStoreGetNoKeyFound-8   1.3µs ± 6%    0.90µs ± 3%	-31.19%	(p=0.000 n=9+9)
CacheKVStoreGetKeyFound-8     0.66µs ± 6%   0.56µs ± 2%	-14.81%	(p=0.000 n=10+9)

alloc/op
B/op
BlockProvision-8	     0.11kB ± 0%    0.10kB ± 0%	-7.14%	(p=0.000 n=10+10)
CacheKVStoreIterator50000-8  0.89MB ± 6%    0.53MB ± 1%	-40.85%	(p=0.000 n=10+10)
CacheKVStoreIterator100000-8 6.3MB ± 23%    1.6MB ± 6%	-74.17%	(p=0.000 n=10+10)
CacheKVStoreGetNoKeyFound-8  0.26kB ± 0%    0.23kB ± 1%	-11.53%	(p=0.000 n=10+8)

allocs/op (count)
AccountMapperSetAccount-8    42 ± 0%	    38 ± 0%	-9.52%	(p=0.000 n=10+10)
BlockProvision-8	     6.0 ± 0%	    5.0 ± 0%	-16.67%	(p=0.000 n=10+10)
CacheKVStoreIterator1000-8   14 ± 0%	    13 ± 0%	-7.14%	(p=0.002 n=8+10)
CacheKVStoreIterator10000-8  0.15k ± 2%	    76 ± 1%	-49.00%	(p=0.000 n=7+10)
CacheKVStoreIterator50000-8  8.9k ± 11%	    2.0k ± 2%	-77.60%	(p=0.000 n=10+10)
CacheKVStoreIterator100000-8 0.10M ± 26%    13k ± 12%	-86.89%	(p=0.000 n=10+10)
CacheKVStoreGetNoKeyFound-8  5.0 ± 0%	    4.0 ± 0%	-20.00%	(p=0.000 n=10+10)
```

Note: Purposefully using a commit off master that doesn't
include the buggy code that caused x/bank.BenchmarkOneBank* to fail
per issue #10023

Updates #9876

/cc @cuonglm @kirbyquerby

<!--
The default pull request template is for types feat, fix, or refactor.
For other templates, add one of the following parameters to the url:
- template=docs.md
- template=other.md
-->

## Description

Closes: #XXXX

<!-- Add a description of the changes that this PR introduces and the files that
are the most critical to review. -->

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [x] added `!` to the type prefix if API or client breaking change
- [x] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [x] provided a link to the relevant issue or specification
- [x] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [x] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] added a changelog entry to `CHANGELOG.md`
- [x] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification
- [ ] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)

(cherry picked from commit 3c85944)

# Conflicts:
#	CHANGELOG.md
@mergify mergify bot requested review from aaronc and alexanderbez as code owners October 14, 2021 21:58
@mergify mergify bot added the conflicts label Oct 14, 2021
@tac0turtle tac0turtle added this to the v0.44.3 milestone Oct 15, 2021
@robert-zaremba
Copy link
Collaborator

going to fix the conflict

@robert-zaremba robert-zaremba added the A:automerge Automatically merge PR once all prerequisites pass. label Oct 15, 2021
@robert-zaremba robert-zaremba merged commit 6079fe1 into release/v0.44.x Oct 15, 2021
@robert-zaremba robert-zaremba deleted the mergify/bp/release/v0.44.x/pr-10024 branch October 15, 2021 17:46
evan-forbes pushed a commit to evan-forbes/cosmos-sdk that referenced this pull request Nov 1, 2021
…binary searches (backport cosmos#10024) (cosmos#10370)

* fix!: store/cachekv: reduce growth factor for iterator ranging using binary searches (cosmos#10024)

This change takes the observation that previous dbm.IsKeyInDomain
which searches for [start, end) was performing too many byteslice
comparisons. Instead we start off by sorting all the values in the
store.unsortedCache, and then apply a modified binary search to
look for values that fall within the domain [start, end)
The procedure involves:
* iterating over all items to build a list of all keys -- O(n)
* invoking sort.Strings immediately, of which
we anyways eventually invoke sort.Slice(unsorted, ...) which uses
Quicksort -- O(nlog(n)) or O(n^2) worst case
* invoking modified binary search which is O(log(n)) * 2 ~ O(log(n))
to search for the [start, end) range indices

for a total approximate complexity of:
Best case:  O(n) + O(n(log(n))) + O(log(n)) ~= O(nlog(n))
Worst case: O(n) + O(n^2) + O(log(n))       ~= O(n^2)

instead of previously:
* iterating over all the unsorted items and invoking dbm.IsKeyInDomain:
bytes.Compare ~ O(n) + O(n*s*e) where s -- len(start), e -- len(end)
for overall complexity of O(n*s*e)
* invoking sort.Slice(unsorted, ...) which uses
Quicksort -- O(nlog(n)) or O(n^2) worst case

for a total approximate complexity of:
Best case:  O(n) + O(n*s*e) + O(nlog(n)) ~= O(n*s*e) ~ O(n^2)
Worst case: O(n) + O(n*s*e) + O(n^2)     ~= O(n*s*e) ~ O(n^2)

Ordinarily we'd combine the n*s*e to be n*m, but really the comparisons
between (start & key, end & key) are profound that it makes sense to
keep them as factors. The overall benchmark results vindicate our choice
of isolating the factors (n*s*e)

The benchmarks show that as the number of keys to iterate grows, the
new code grows gracefully in a somewhat linear growth, notice for
CAcheKVStoreIterator*, when we go from:
* 1,000 to 10,000 keys: 120us->1,600us (13X) old vs 95us->900us (9.47X) new
* 50,000 to 100,000 keys: 19ms->100ms (5.3X) old vs 5.5ms->17ms (3X) new

```shell
time/op
GetValidator-8	              5.8ms ± 2%    4.7ms ± 1%	-17.69%	(p=0.000 n=10+10)
OneBankSendTxPerBlock-8	      3.2ms ± 2%    2.8ms ± 1%	-10.80%	(p=0.000 n=7+10)
OneBankMultiSendTxPerBlock-8  3.1ms ± 3%    2.9ms ± 2%	-8.36%	(p=0.000 n=10+10)
AccountMapperSetAccount-8     8.6µs ± 1%    7.8µs ± 1%	-9.74%	(p=0.000 n=10+10)
CacheKVStoreIterator500-8     64µs ± 6%	    51µs ± 6%	-19.22%	(p=0.000 n=10+9)
CacheKVStoreIterator1000-8    0.12ms ± 4%   95µs ± 4%	-19.55%	(p=0.000 n=10+10)
CacheKVStoreIterator10000-8   1.6ms ± 4%    0.90ms ± 1%	-42.11%	(p=0.000 n=10+10)
CacheKVStoreIterator50000-8   19ms ± 5%	    5.5ms ± 1%	-71.35%	(p=0.000 n=10+10)
CacheKVStoreIterator100000-8  0.10s ± 23%   17ms ± 7%	-83.44%	(p=0.000 n=10+10)
CacheKVStoreGetNoKeyFound-8   1.3µs ± 6%    0.90µs ± 3%	-31.19%	(p=0.000 n=9+9)
CacheKVStoreGetKeyFound-8     0.66µs ± 6%   0.56µs ± 2%	-14.81%	(p=0.000 n=10+9)

alloc/op
B/op
BlockProvision-8	     0.11kB ± 0%    0.10kB ± 0%	-7.14%	(p=0.000 n=10+10)
CacheKVStoreIterator50000-8  0.89MB ± 6%    0.53MB ± 1%	-40.85%	(p=0.000 n=10+10)
CacheKVStoreIterator100000-8 6.3MB ± 23%    1.6MB ± 6%	-74.17%	(p=0.000 n=10+10)
CacheKVStoreGetNoKeyFound-8  0.26kB ± 0%    0.23kB ± 1%	-11.53%	(p=0.000 n=10+8)

allocs/op (count)
AccountMapperSetAccount-8    42 ± 0%	    38 ± 0%	-9.52%	(p=0.000 n=10+10)
BlockProvision-8	     6.0 ± 0%	    5.0 ± 0%	-16.67%	(p=0.000 n=10+10)
CacheKVStoreIterator1000-8   14 ± 0%	    13 ± 0%	-7.14%	(p=0.002 n=8+10)
CacheKVStoreIterator10000-8  0.15k ± 2%	    76 ± 1%	-49.00%	(p=0.000 n=7+10)
CacheKVStoreIterator50000-8  8.9k ± 11%	    2.0k ± 2%	-77.60%	(p=0.000 n=10+10)
CacheKVStoreIterator100000-8 0.10M ± 26%    13k ± 12%	-86.89%	(p=0.000 n=10+10)
CacheKVStoreGetNoKeyFound-8  5.0 ± 0%	    4.0 ± 0%	-20.00%	(p=0.000 n=10+10)
```

Note: Purposefully using a commit off master that doesn't
include the buggy code that caused x/bank.BenchmarkOneBank* to fail
per issue cosmos#10023

Updates cosmos#9876

/cc @cuonglm @kirbyquerby

<!--
The default pull request template is for types feat, fix, or refactor.
For other templates, add one of the following parameters to the url:
- template=docs.md
- template=other.md
-->

## Description

Closes: #XXXX

<!-- Add a description of the changes that this PR introduces and the files that
are the most critical to review. -->

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [x] added `!` to the type prefix if API or client breaking change
- [x] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [x] provided a link to the relevant issue or specification
- [x] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [x] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] added a changelog entry to `CHANGELOG.md`
- [x] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification
- [ ] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)

(cherry picked from commit 3c85944)

# Conflicts:
#	CHANGELOG.md

* fix conflict

Co-authored-by: Emmanuel T Odeke <[email protected]>
Co-authored-by: marbar3778 <[email protected]>
Co-authored-by: Robert Zaremba <[email protected]>
zakir-code pushed a commit to PundiAI/cosmos-sdk that referenced this pull request Apr 14, 2022
…binary searches (backport cosmos#10024) (cosmos#10370)

* fix!: store/cachekv: reduce growth factor for iterator ranging using binary searches (cosmos#10024)

This change takes the observation that previous dbm.IsKeyInDomain
which searches for [start, end) was performing too many byteslice
comparisons. Instead we start off by sorting all the values in the
store.unsortedCache, and then apply a modified binary search to
look for values that fall within the domain [start, end)
The procedure involves:
* iterating over all items to build a list of all keys -- O(n)
* invoking sort.Strings immediately, of which
we anyways eventually invoke sort.Slice(unsorted, ...) which uses
Quicksort -- O(nlog(n)) or O(n^2) worst case
* invoking modified binary search which is O(log(n)) * 2 ~ O(log(n))
to search for the [start, end) range indices

for a total approximate complexity of:
Best case:  O(n) + O(n(log(n))) + O(log(n)) ~= O(nlog(n))
Worst case: O(n) + O(n^2) + O(log(n))       ~= O(n^2)

instead of previously:
* iterating over all the unsorted items and invoking dbm.IsKeyInDomain:
bytes.Compare ~ O(n) + O(n*s*e) where s -- len(start), e -- len(end)
for overall complexity of O(n*s*e)
* invoking sort.Slice(unsorted, ...) which uses
Quicksort -- O(nlog(n)) or O(n^2) worst case

for a total approximate complexity of:
Best case:  O(n) + O(n*s*e) + O(nlog(n)) ~= O(n*s*e) ~ O(n^2)
Worst case: O(n) + O(n*s*e) + O(n^2)     ~= O(n*s*e) ~ O(n^2)

Ordinarily we'd combine the n*s*e to be n*m, but really the comparisons
between (start & key, end & key) are profound that it makes sense to
keep them as factors. The overall benchmark results vindicate our choice
of isolating the factors (n*s*e)

The benchmarks show that as the number of keys to iterate grows, the
new code grows gracefully in a somewhat linear growth, notice for
CAcheKVStoreIterator*, when we go from:
* 1,000 to 10,000 keys: 120us->1,600us (13X) old vs 95us->900us (9.47X) new
* 50,000 to 100,000 keys: 19ms->100ms (5.3X) old vs 5.5ms->17ms (3X) new

```shell
time/op
GetValidator-8	              5.8ms ± 2%    4.7ms ± 1%	-17.69%	(p=0.000 n=10+10)
OneBankSendTxPerBlock-8	      3.2ms ± 2%    2.8ms ± 1%	-10.80%	(p=0.000 n=7+10)
OneBankMultiSendTxPerBlock-8  3.1ms ± 3%    2.9ms ± 2%	-8.36%	(p=0.000 n=10+10)
AccountMapperSetAccount-8     8.6µs ± 1%    7.8µs ± 1%	-9.74%	(p=0.000 n=10+10)
CacheKVStoreIterator500-8     64µs ± 6%	    51µs ± 6%	-19.22%	(p=0.000 n=10+9)
CacheKVStoreIterator1000-8    0.12ms ± 4%   95µs ± 4%	-19.55%	(p=0.000 n=10+10)
CacheKVStoreIterator10000-8   1.6ms ± 4%    0.90ms ± 1%	-42.11%	(p=0.000 n=10+10)
CacheKVStoreIterator50000-8   19ms ± 5%	    5.5ms ± 1%	-71.35%	(p=0.000 n=10+10)
CacheKVStoreIterator100000-8  0.10s ± 23%   17ms ± 7%	-83.44%	(p=0.000 n=10+10)
CacheKVStoreGetNoKeyFound-8   1.3µs ± 6%    0.90µs ± 3%	-31.19%	(p=0.000 n=9+9)
CacheKVStoreGetKeyFound-8     0.66µs ± 6%   0.56µs ± 2%	-14.81%	(p=0.000 n=10+9)

alloc/op
B/op
BlockProvision-8	     0.11kB ± 0%    0.10kB ± 0%	-7.14%	(p=0.000 n=10+10)
CacheKVStoreIterator50000-8  0.89MB ± 6%    0.53MB ± 1%	-40.85%	(p=0.000 n=10+10)
CacheKVStoreIterator100000-8 6.3MB ± 23%    1.6MB ± 6%	-74.17%	(p=0.000 n=10+10)
CacheKVStoreGetNoKeyFound-8  0.26kB ± 0%    0.23kB ± 1%	-11.53%	(p=0.000 n=10+8)

allocs/op (count)
AccountMapperSetAccount-8    42 ± 0%	    38 ± 0%	-9.52%	(p=0.000 n=10+10)
BlockProvision-8	     6.0 ± 0%	    5.0 ± 0%	-16.67%	(p=0.000 n=10+10)
CacheKVStoreIterator1000-8   14 ± 0%	    13 ± 0%	-7.14%	(p=0.002 n=8+10)
CacheKVStoreIterator10000-8  0.15k ± 2%	    76 ± 1%	-49.00%	(p=0.000 n=7+10)
CacheKVStoreIterator50000-8  8.9k ± 11%	    2.0k ± 2%	-77.60%	(p=0.000 n=10+10)
CacheKVStoreIterator100000-8 0.10M ± 26%    13k ± 12%	-86.89%	(p=0.000 n=10+10)
CacheKVStoreGetNoKeyFound-8  5.0 ± 0%	    4.0 ± 0%	-20.00%	(p=0.000 n=10+10)
```

Note: Purposefully using a commit off master that doesn't
include the buggy code that caused x/bank.BenchmarkOneBank* to fail
per issue cosmos#10023

Updates cosmos#9876

/cc @cuonglm @kirbyquerby

<!--
The default pull request template is for types feat, fix, or refactor.
For other templates, add one of the following parameters to the url:
- template=docs.md
- template=other.md
-->

## Description

Closes: #XXXX

<!-- Add a description of the changes that this PR introduces and the files that
are the most critical to review. -->

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [x] added `!` to the type prefix if API or client breaking change
- [x] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [x] provided a link to the relevant issue or specification
- [x] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [x] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] added a changelog entry to `CHANGELOG.md`
- [x] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification
- [ ] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)

(cherry picked from commit 3c85944)

# Conflicts:
#	CHANGELOG.md

* fix conflict

Co-authored-by: Emmanuel T Odeke <[email protected]>
Co-authored-by: marbar3778 <[email protected]>
Co-authored-by: Robert Zaremba <[email protected]>
JeancarloBarrios pushed a commit to agoric-labs/cosmos-sdk that referenced this pull request Sep 28, 2024
…binary searches (backport cosmos#10024) (cosmos#10370)

* fix!: store/cachekv: reduce growth factor for iterator ranging using binary searches (cosmos#10024)

This change takes the observation that previous dbm.IsKeyInDomain
which searches for [start, end) was performing too many byteslice
comparisons. Instead we start off by sorting all the values in the
store.unsortedCache, and then apply a modified binary search to
look for values that fall within the domain [start, end)
The procedure involves:
* iterating over all items to build a list of all keys -- O(n)
* invoking sort.Strings immediately, of which
we anyways eventually invoke sort.Slice(unsorted, ...) which uses
Quicksort -- O(nlog(n)) or O(n^2) worst case
* invoking modified binary search which is O(log(n)) * 2 ~ O(log(n))
to search for the [start, end) range indices

for a total approximate complexity of:
Best case:  O(n) + O(n(log(n))) + O(log(n)) ~= O(nlog(n))
Worst case: O(n) + O(n^2) + O(log(n))       ~= O(n^2)

instead of previously:
* iterating over all the unsorted items and invoking dbm.IsKeyInDomain:
bytes.Compare ~ O(n) + O(n*s*e) where s -- len(start), e -- len(end)
for overall complexity of O(n*s*e)
* invoking sort.Slice(unsorted, ...) which uses
Quicksort -- O(nlog(n)) or O(n^2) worst case

for a total approximate complexity of:
Best case:  O(n) + O(n*s*e) + O(nlog(n)) ~= O(n*s*e) ~ O(n^2)
Worst case: O(n) + O(n*s*e) + O(n^2)     ~= O(n*s*e) ~ O(n^2)

Ordinarily we'd combine the n*s*e to be n*m, but really the comparisons
between (start & key, end & key) are profound that it makes sense to
keep them as factors. The overall benchmark results vindicate our choice
of isolating the factors (n*s*e)

The benchmarks show that as the number of keys to iterate grows, the
new code grows gracefully in a somewhat linear growth, notice for
CAcheKVStoreIterator*, when we go from:
* 1,000 to 10,000 keys: 120us->1,600us (13X) old vs 95us->900us (9.47X) new
* 50,000 to 100,000 keys: 19ms->100ms (5.3X) old vs 5.5ms->17ms (3X) new

```shell
time/op
GetValidator-8	              5.8ms ± 2%    4.7ms ± 1%	-17.69%	(p=0.000 n=10+10)
OneBankSendTxPerBlock-8	      3.2ms ± 2%    2.8ms ± 1%	-10.80%	(p=0.000 n=7+10)
OneBankMultiSendTxPerBlock-8  3.1ms ± 3%    2.9ms ± 2%	-8.36%	(p=0.000 n=10+10)
AccountMapperSetAccount-8     8.6µs ± 1%    7.8µs ± 1%	-9.74%	(p=0.000 n=10+10)
CacheKVStoreIterator500-8     64µs ± 6%	    51µs ± 6%	-19.22%	(p=0.000 n=10+9)
CacheKVStoreIterator1000-8    0.12ms ± 4%   95µs ± 4%	-19.55%	(p=0.000 n=10+10)
CacheKVStoreIterator10000-8   1.6ms ± 4%    0.90ms ± 1%	-42.11%	(p=0.000 n=10+10)
CacheKVStoreIterator50000-8   19ms ± 5%	    5.5ms ± 1%	-71.35%	(p=0.000 n=10+10)
CacheKVStoreIterator100000-8  0.10s ± 23%   17ms ± 7%	-83.44%	(p=0.000 n=10+10)
CacheKVStoreGetNoKeyFound-8   1.3µs ± 6%    0.90µs ± 3%	-31.19%	(p=0.000 n=9+9)
CacheKVStoreGetKeyFound-8     0.66µs ± 6%   0.56µs ± 2%	-14.81%	(p=0.000 n=10+9)

alloc/op
B/op
BlockProvision-8	     0.11kB ± 0%    0.10kB ± 0%	-7.14%	(p=0.000 n=10+10)
CacheKVStoreIterator50000-8  0.89MB ± 6%    0.53MB ± 1%	-40.85%	(p=0.000 n=10+10)
CacheKVStoreIterator100000-8 6.3MB ± 23%    1.6MB ± 6%	-74.17%	(p=0.000 n=10+10)
CacheKVStoreGetNoKeyFound-8  0.26kB ± 0%    0.23kB ± 1%	-11.53%	(p=0.000 n=10+8)

allocs/op (count)
AccountMapperSetAccount-8    42 ± 0%	    38 ± 0%	-9.52%	(p=0.000 n=10+10)
BlockProvision-8	     6.0 ± 0%	    5.0 ± 0%	-16.67%	(p=0.000 n=10+10)
CacheKVStoreIterator1000-8   14 ± 0%	    13 ± 0%	-7.14%	(p=0.002 n=8+10)
CacheKVStoreIterator10000-8  0.15k ± 2%	    76 ± 1%	-49.00%	(p=0.000 n=7+10)
CacheKVStoreIterator50000-8  8.9k ± 11%	    2.0k ± 2%	-77.60%	(p=0.000 n=10+10)
CacheKVStoreIterator100000-8 0.10M ± 26%    13k ± 12%	-86.89%	(p=0.000 n=10+10)
CacheKVStoreGetNoKeyFound-8  5.0 ± 0%	    4.0 ± 0%	-20.00%	(p=0.000 n=10+10)
```

Note: Purposefully using a commit off master that doesn't
include the buggy code that caused x/bank.BenchmarkOneBank* to fail
per issue cosmos#10023

Updates cosmos#9876

/cc @cuonglm @kirbyquerby

<!--
The default pull request template is for types feat, fix, or refactor.
For other templates, add one of the following parameters to the url:
- template=docs.md
- template=other.md
-->

## Description

Closes: #XXXX

<!-- Add a description of the changes that this PR introduces and the files that
are the most critical to review. -->

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [x] added `!` to the type prefix if API or client breaking change
- [x] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [x] provided a link to the relevant issue or specification
- [x] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [x] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] added a changelog entry to `CHANGELOG.md`
- [x] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification
- [ ] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)

(cherry picked from commit 3c85944)

# Conflicts:
#	CHANGELOG.md

* fix conflict

Co-authored-by: Emmanuel T Odeke <[email protected]>
Co-authored-by: marbar3778 <[email protected]>
Co-authored-by: Robert Zaremba <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A:automerge Automatically merge PR once all prerequisites pass.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants