-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: read break down #1094
feat: read break down #1094
Conversation
Add telemetry for read path telemetry for shared buffer, block cache, iterators to answer several questions: 1. Get/range_scan/reverse_range_scan ops and throughput issued at state store layer, and shared_buffer hit counts. 2. Get/range_scan/reverse_range_scan duration, from state store layer. 3. SSTable block IO durations (fetch from remote, put to remote) 4. SSTable block requests/second (read, block cache hit, fetch remote, put remote) 5. SSTable ops (bloom filter true negative counts, possible positive counts, # tableiterators of merge iterators) 6. MergeIterator ops (next, seek), duration (seek, next).
Why do we need MergeIterator ops (next, seek), duration (seek, next)? Isn't UserIterator's metrics enough? If you intend to add metrics to MergeIterator, what about ConcatIterator, ReverseMergeIterator, and ReverseConcatIterator? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the rest!
@@ -56,14 +58,21 @@ pub struct MergeIteratorInner<'a, const DIRECTION: usize> { | |||
|
|||
/// The heap for merge sort. | |||
heap: BinaryHeap<Node<'a, DIRECTION>>, | |||
|
|||
/// Statistics. | |||
stats: Option<Arc<StateStoreMetrics>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if remove Option
, and pass StateStoreMetrics::unused()
in unit tests?
Now all these metrics structs implemented a interface xxx::unused()
for unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if remove
Option
, and passStateStoreMetrics::unused()
in unit tests? Now all these metrics structs implemented a interfacexxx::unused()
for unit tests.
Sounds good!
@@ -53,7 +53,7 @@ mod test { | |||
.map(|x| Box::new(x) as BoxedHummockIterator) | |||
.collect_vec(); | |||
|
|||
let mut mi = ReverseMergeIterator::new(iters); | |||
let mut mi = ReverseMergeIterator::new(iters, None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to pass StateStoreMetrics::unused()
No,
MI is special because it backs all operations (get, scan, iter) and combines all table iterators, and it operates on local version manager directly so it has a view point of current LSM tree shape. ConcatIterator is just a glue of table iterators and trivial to monitor (except the true positive rate of bloom filter). |
I'm afraid such metrics is too fine-grained for a storage system... I think they're better enabled only on perf instead of directly expose them to users. Every time series at Prometheus has overhead. |
@@ -318,7 +319,7 @@ mod tests { | |||
.map(|x| Box::new(x) as BoxedHummockIterator) | |||
.collect_vec(); | |||
|
|||
let mi = MergeIterator::new(iters); | |||
let mi = MergeIterator::new(iters, Arc::new(StateStoreMetrics::unused())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend having a MergeIterator::new_for_test(iters);
These metrics are added because I find them quite basic to be missing when I want to have a clue about some streaming TPCH workload. I also don't want the metrics to be too granular to maintain and impact performance. |
Add telemetry for read path telemetry for shared buffer, block cache, iterators to answer several questions:
Checklist
Refer to a related PR or issue link (optional)