Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: do not fetch all tables during every query. #1558

Merged
merged 1 commit into from
Apr 8, 2022

Conversation

Little-Wallace
Copy link
Contributor

@Little-Wallace Little-Wallace commented Apr 2, 2022

Signed-off-by: Little-Wallace [email protected]

What's changed and what's your intention?

close #1507
Although the meta most of tables will be cached, if there are too many files in the whole cluster , the bloom-filter-data would cost a lot of memory. In fact, most of request only concerned about a few data in a small range of the current executor , so they do not need to fetch all table from the table-cache.

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

@Little-Wallace
Copy link
Contributor Author

I will add some unit test to ensure correctness and add more benchmark for iterator test.

@TennyZhuang TennyZhuang requested a review from hzxa21 April 2, 2022 07:56
@StrikeW
Copy link
Contributor

StrikeW commented Apr 2, 2022

Better to add some comments in the code😊

.table_infos
.iter()
.map(|sst| sst.id)
.collect_vec()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to collect here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand ....

rust/meta/src/hummock/compaction.rs Outdated Show resolved Hide resolved
rust/storage/src/hummock/mod.rs Outdated Show resolved Hide resolved
@Little-Wallace Little-Wallace force-pushed the wallace/meta branch 2 times, most recently from 1265e6c to 92b7e33 Compare April 6, 2022 07:48
@twocode twocode requested a review from soundOfDestiny April 6, 2022 09:00
Copy link
Contributor

@soundOfDestiny soundOfDestiny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are get_compact_task and report_compact_tasks calls here proper? better ask @zwang28

src/bench/ss_bench/operations/write_batch.rs Show resolved Hide resolved
Copy link
Contributor

@soundOfDestiny soundOfDestiny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does first level have to be Overlapping?

src/storage/src/hummock/mock/mock_hummock_meta_client.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@soundOfDestiny soundOfDestiny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if compact task failed?

src/storage/src/hummock/mock/mock_hummock_meta_service.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/mod.rs Show resolved Hide resolved
src/storage/src/hummock/mod.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/mod.rs Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Apr 7, 2022

Codecov Report

Merging #1558 (1e97104) into main (fbaad8f) will decrease coverage by 0.06%.
The diff coverage is 56.02%.

@@             Coverage Diff              @@
##               main    #1558      +/-   ##
============================================
- Coverage     69.91%   69.85%   -0.07%     
  Complexity     2766     2766              
============================================
  Files          1052     1052              
  Lines         92464    92678     +214     
  Branches       1790     1790              
============================================
+ Hits          64650    64741      +91     
- Misses        26923    27046     +123     
  Partials        891      891              
Flag Coverage Δ
java 61.01% <ø> (ø)
rust 71.68% <56.02%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/bench/ss_bench/main.rs 0.81% <0.00%> (-0.12%) ⬇️
src/bench/ss_bench/operations/mod.rs 0.00% <0.00%> (ø)
src/bench/ss_bench/operations/write_batch.rs 0.00% <0.00%> (ø)
src/meta/src/hummock/mock_hummock_meta_client.rs 37.50% <0.00%> (-13.23%) ⬇️
src/meta/src/hummock/mod.rs 30.27% <0.00%> (-2.40%) ⬇️
src/meta/src/hummock/compaction.rs 77.28% <41.37%> (-2.61%) ⬇️
src/storage/src/hummock/utils.rs 87.80% <63.63%> (-8.87%) ⬇️
src/storage/src/hummock/local_version_manager.rs 85.76% <66.66%> (+1.17%) ⬆️
src/storage/src/hummock/mod.rs 72.40% <68.90%> (-1.27%) ⬇️
src/storage/src/hummock/snapshot_tests.rs 96.69% <91.66%> (-0.68%) ⬇️
... and 28 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@Little-Wallace
Copy link
Contributor Author

Little-Wallace commented Apr 7, 2022

I run a benchmark to test the filter effect for latency.

benchmark command

 cargo run --release --bin ss-bench -- \
 --benchmarks "writebatch,prefixscanrandom"  --batch-size 100000 \
 --writes 500000 \
 --reads 0 \
 --scans 10000 \
 --deletes 0 \
 --concurrency-num 1 \
 --seed 233 \
 --statistics --store "hummock+minio://minioadmin:[email protected]:9000/bucket" --compact-level-after-write 1 --table-size-mb=2

benchmark result

current PR
prefixscanrandom
latency:
min: 8.416 micro sec,
mean: 13.158 micro sec,
p50: 9.541 micro sec,
p90: 18.250 micro sec,
p99: 24.208 micro sec,
max: 797.125 micro sec,
std_dev: 21845.084;
QPS: 0 1772720 bytes/sec

main
prefixscanrandom
latency:
min: 30.208 micro sec,
mean: 38.120 micro sec,
p50: 33.083 micro sec,
p90: 47.708 micro sec,
p99: 73.958 micro sec,
max: 11.422 milli sec,
std_dev: 114168.059;
QPS: 0 623231 bytes/sec

Because the main branch can not run a iterator bench after compaction, I commented out some code from the current branch to replace the main branch.

            let mut overlapped_sstable_iters = vec![];
            for level in &version.levels() {
                let table_ids = level
                    .table_infos
                    .iter()
                    // .filter(|info| {
                    //     let table_range = info.key_range.as_ref().unwrap();
                    //     let table_start = user_key(table_range.left.as_slice());
                    //     let table_end = user_key(table_range.right.as_slice());
                    //     range_overlap(&key_range, table_start, table_end, false)
                    // })
                    .map(|info| info.id)
                    .collect_vec();
                if table_ids.is_empty() {
                    continue;
                }
                table_count += table_ids.len();
                let tables = self
                    .local_version_manager
                    .pick_few_tables(&table_ids)
                    .await?;
                // match level.level_type() {
                //     LevelType::Overlapping => {
                        for table in tables.into_iter().rev() {
                            overlapped_sstable_iters.push(Box::new(SSTableIterator::new(
                                table,
                                self.sstable_store.clone(),
                            ))
                                as BoxedHummockIterator);
                        }
                //     }
                //     LevelType::Nonoverlapping => overlapped_sstable_iters.push(Box::new(
                //         ConcatIterator::new(tables, self.sstable_store.clone()),
                //     )),
                // }
            }

commit 80a66b5f81fb6c8662bcff7a792d4487d3cb7027
Author: Little-Wallace <[email protected]>
Date:   Thu Apr 7 18:44:41 2022 +0800

        feat(frontend): show command (#1545)

        1.support show command (databases, schemas, tables, mviews, columns) and describe table or source.
        ```
        SHOW DATABASES
        SHOW SCHEMAS
        SHOW COLUMNS FROM <TABLE | SOURCE>
        DESCRIBE <TABLE | SOURCE>
        SHOW TABLES [FROM SCHEMA]
        SHOW MATERIALIZED VIEWS [FROM SCHEMA]
        ```
        2.remove show table and show source change to describe and show columns.

    commit 7403271
    Author: zwang28 <[email protected]>
    Date:   Thu Apr 7 11:03:44 2022 +0800

        chore: fix e2e command in doc. (#1626)

    commit fa2bff4
    Author: Alex Chi <[email protected]>
    Date:   Thu Apr 7 10:50:02 2022 +0800

        docs: add getting started guide (#1625)

        Signed-off-by: Alex Chi <[email protected]>

    Signed-off-by: Little-Wallace <[email protected]>

commit 4a13f7eb21b7fd58d4bfaced84fee71fa0834a94
Author: Little-Wallace <[email protected]>
Date:   Thu Apr 7 16:18:19 2022 +0800

    fix clippy

    Signed-off-by: Little-Wallace <[email protected]>

commit 8d206d98e254a01e568965e6ee96f14e3f36cf42
Author: Little-Wallace <[email protected]>
Date:   Thu Apr 7 15:53:04 2022 +0800

    add more test

    Signed-off-by: Little-Wallace <[email protected]>

commit 4432ab7337419b4ef7dbb267f470294ab751b2a7
Author: Little-Wallace <[email protected]>
Date:   Thu Apr 7 14:47:17 2022 +0800

    fix test

    Signed-off-by: Little-Wallace <[email protected]>

commit cf075a762d660109797206ba52a921fa369663b7
Author: Little-Wallace <[email protected]>
Date:   Thu Apr 7 14:37:09 2022 +0800

    tmp

    Signed-off-by: Little-Wallace <[email protected]>

commit 949a85f026eb8a3c25584775ab39b83d05197f61
Author: Little-Wallace <[email protected]>
Date:   Thu Apr 7 11:50:35 2022 +0800

    fix clippy

    Signed-off-by: Little-Wallace <[email protected]>

commit 94cbe1a4f6f4e01d706e63e812460fde2e7d6b2b
Author: Little-Wallace <[email protected]>
Date:   Thu Apr 7 11:28:53 2022 +0800

    add some note

    Signed-off-by: Little-Wallace <[email protected]>
Copy link
Contributor

@soundOfDestiny soundOfDestiny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Little-Wallace Little-Wallace merged commit 8316499 into main Apr 8, 2022
@Little-Wallace Little-Wallace deleted the wallace/meta branch April 8, 2022 02:33
@@ -42,6 +42,44 @@ impl MockHummockMetaClient {
context_id,
}
}

pub async fn get_compact_task(&self, target_level: u32) -> Option<CompactTask> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a get_compact_task method in HummockManager, can it be used directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But there is a difficult that HummockManager trigger it automatic but i need a manual interface

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But there is a difficult that HummockManager trigger it automatic but i need a manual interface

HummockManager::get_compact_task itself is manual and we do have a reference to hummock_manager in MockHummockMetaClient.
The automatic compact trigger will only be started along with meta node, which is not the case here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will remove it in #1694. please take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor HummockVersion to avoid fetch all tables meta
7 participants