Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(hummock): make meta to be block and fit into block cache #732

Closed
wants to merge 8 commits into from

Conversation

MrCroxx
Copy link
Contributor

@MrCroxx MrCroxx commented Mar 7, 2022

What's changed and what's your intention?

Make SST meta to be block and fit into block cache.

TODOs(another PRs):

  • refactor block struct

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

#537

@MrCroxx MrCroxx mentioned this pull request Mar 7, 2022
6 tasks
@MrCroxx MrCroxx self-assigned this Mar 7, 2022
@MrCroxx MrCroxx requested review from BugenZhao, zwang28 and hzxa21 March 7, 2022 11:58
@codecov
Copy link

codecov bot commented Mar 7, 2022

Codecov Report

Merging #732 (3a685d4) into main (1f8f171) will increase coverage by 0.00%.
The diff coverage is 86.23%.

Impacted file tree graph

@@            Coverage Diff            @@
##               main     #732   +/-   ##
=========================================
  Coverage     71.73%   71.74%           
  Complexity     2706     2706           
=========================================
  Files           901      901           
  Lines         52333    52386   +53     
  Branches       1781     1781           
=========================================
+ Hits          37540    37583   +43     
- Misses        13978    13988   +10     
  Partials        815      815           
Flag Coverage Δ
java 58.86% <ø> (ø)
rust 77.26% <86.23%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
rust/common/src/types/mod.rs 66.32% <ø> (ø)
rust/storage/src/hummock/compactor.rs 63.08% <ø> (ø)
rust/storage/src/hummock/error.rs 19.04% <0.00%> (-2.01%) ⬇️
rust/storage/src/store_impl.rs 4.76% <0.00%> (-0.12%) ⬇️
rust/storage/src/hummock/local_version_manager.rs 91.91% <66.66%> (-0.87%) ⬇️
rust/storage/src/hummock/sstable_store.rs 75.00% <76.19%> (-3.58%) ⬇️
rust/storage/src/hummock/sstable/mod.rs 85.24% <95.65%> (+6.29%) ⬆️
rust/meta/src/hummock/integration_tests.rs 100.00% <100.00%> (ø)
rust/meta/src/hummock/test_utils.rs 100.00% <100.00%> (ø)
rust/storage/src/hummock/iterator/reverse_user.rs 99.23% <100.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f8f171...3a685d4. Read the comment docs.

@MrCroxx
Copy link
Contributor Author

MrCroxx commented Mar 7, 2022

Seems decoding SstableMeta consumes a lot of time. I'll investigate it tomorrow.

@@ -64,8 +61,12 @@ impl SstableStore {
let block = Block::decode(data.slice(offset..offset + len), offset)?;
self.block_cache
.insert(sst.id, block_idx as u64, block)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert!(block_idx != META_BLOCK_IDX)?

Block::decode(buf, 0)
};

let meta_block = match policy {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should meta block cache entry live as long as the data block cache entries of the same SST?

random question: what is the benefit of putting meta and data into two different files? Can we put meta block as the first block of SST and use S3 range GET to fetch meta block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should meta block cache entry live as long as the data block cache entries of the same SST?

For the first question, as @TennyZhuang explained, meta blocks will be touched more than data blocks, and not frequently touched meta blocks are supposed to be evicted by design. The behavior is not different from the normal block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second question, as I benched latencies of various access methods of S3, the latency always increases with reading bytes. I think there is no benefits to separates meta block and data blocks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first question, as @TennyZhuang explained, meta blocks will be touched more than data blocks, and not frequently touched meta blocks are supposed to be evicted by design. The behavior is not different from the normal block.

data blocks are less useful if the corresponding meta block is evicted because we need a extra GET before we can leverage the data block cache entries. If we use the same cache policy for meta block, meta block can be evicted beforehand (for example, if we have 1:10 ratio for meta: data, putting a data block of a new SST into cache can evict meta block from an existing SST even though its data blocks are not all evicted because data blocks are put into cache after meta block). I think we need some microbenchmark before reaching a conclusion but I am okay with starting with the simplest approach first.

For the second question, as I benched latencies of various access methods of S3, the latency always increases with reading bytes. I think there is no benefits to separates meta block and data blocks.

I think you mean there is no benefits to combine meta block and data blocks into a single object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, if we have 1:10 ratio for meta: data

The consulation based on an assumption that accessing data block always needs to access its meta first. I'll check our code if we actually did that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean there is no benefits to combine meta block and data blocks into a single object.

I mean either combing or separating meta and block in one object is okay.

@MrCroxx
Copy link
Contributor Author

MrCroxx commented Mar 7, 2022

If decoding affects the performances a lot, meta cache will not be removed in SstableStore. But meta is still supposed to be block, for Secondary Cache can also benefits from it.

@MrCroxx MrCroxx marked this pull request as draft March 7, 2022 14:16
@@ -84,10 +84,17 @@ impl StateStoreImpl {
unimplemented!("{} Hummock only supports s3 and minio for now.", other)
}
};

let checksum_algo = match config.checksum_algo.as_str() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually im curious since we have made the checksum algorithm configurable, will we store the config value in the footer/header of every sst file? I didn't find this entry in SstableMeta.

@MrCroxx
Copy link
Contributor Author

MrCroxx commented Mar 31, 2022

Duplicated with #1447 , #1462 . Close.

@MrCroxx MrCroxx closed this Mar 31, 2022
@MrCroxx MrCroxx deleted the xx/meta-block branch March 31, 2022 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants