-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(hummock): make meta to be block and fit into block cache #732
Conversation
Codecov Report
@@ Coverage Diff @@
## main #732 +/- ##
=========================================
Coverage 71.73% 71.74%
Complexity 2706 2706
=========================================
Files 901 901
Lines 52333 52386 +53
Branches 1781 1781
=========================================
+ Hits 37540 37583 +43
- Misses 13978 13988 +10
Partials 815 815
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Seems decoding |
@@ -64,8 +61,12 @@ impl SstableStore { | |||
let block = Block::decode(data.slice(offset..offset + len), offset)?; | |||
self.block_cache | |||
.insert(sst.id, block_idx as u64, block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert!(block_idx != META_BLOCK_IDX)?
Block::decode(buf, 0) | ||
}; | ||
|
||
let meta_block = match policy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should meta block cache entry live as long as the data block cache entries of the same SST?
random question: what is the benefit of putting meta and data into two different files? Can we put meta block as the first block of SST and use S3 range GET to fetch meta block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should meta block cache entry live as long as the data block cache entries of the same SST?
For the first question, as @TennyZhuang explained, meta blocks will be touched more than data blocks, and not frequently touched meta blocks are supposed to be evicted by design. The behavior is not different from the normal block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the second question, as I benched latencies of various access methods of S3, the latency always increases with reading bytes. I think there is no benefits to separates meta block and data blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first question, as @TennyZhuang explained, meta blocks will be touched more than data blocks, and not frequently touched meta blocks are supposed to be evicted by design. The behavior is not different from the normal block.
data blocks are less useful if the corresponding meta block is evicted because we need a extra GET before we can leverage the data block cache entries. If we use the same cache policy for meta block, meta block can be evicted beforehand (for example, if we have 1:10 ratio for meta: data, putting a data block of a new SST into cache can evict meta block from an existing SST even though its data blocks are not all evicted because data blocks are put into cache after meta block). I think we need some microbenchmark before reaching a conclusion but I am okay with starting with the simplest approach first.
For the second question, as I benched latencies of various access methods of S3, the latency always increases with reading bytes. I think there is no benefits to separates meta block and data blocks.
I think you mean there is no benefits to combine meta block and data blocks into a single object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example, if we have 1:10 ratio for meta: data
The consulation based on an assumption that accessing data block always needs to access its meta first. I'll check our code if we actually did that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean there is no benefits to combine meta block and data blocks into a single object.
I mean either combing or separating meta and block in one object is okay.
If decoding affects the performances a lot, meta cache will not be removed in |
@@ -84,10 +84,17 @@ impl StateStoreImpl { | |||
unimplemented!("{} Hummock only supports s3 and minio for now.", other) | |||
} | |||
}; | |||
|
|||
let checksum_algo = match config.checksum_algo.as_str() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually im curious since we have made the checksum algorithm configurable, will we store the config value in the footer/header of every sst file? I didn't find this entry in SstableMeta.
What's changed and what's your intention?
Make SST meta to be block and fit into block cache.
TODOs(another PRs):
Checklist
Refer to a related PR or issue link (optional)
#537