Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: read break down #1094

Merged
merged 3 commits into from
Mar 21, 2022
Merged

feat: read break down #1094

merged 3 commits into from
Mar 21, 2022

Conversation

twocode
Copy link
Contributor

@twocode twocode commented Mar 21, 2022

Add telemetry for read path telemetry for shared buffer, block cache, iterators to answer several questions:

  1. Get/range_scan/reverse_range_scan ops and throughput issued at state store layer, and shared_buffer hit counts.
  2. Get/range_scan/reverse_range_scan duration, from state store layer.
  3. SSTable block IO durations (fetch from remote, put to remote)
  4. SSTable block requests/second (read, block cache hit, fetch remote, put remote)
  5. SSTable ops (bloom filter true negative counts, possible positive counts, # tableiterators of merge iterators)
  6. MergeIterator ops (next, seek), duration (seek, next).

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

Add telemetry for read path telemetry for shared buffer, block cache, iterators to answer several questions:
1. Get/range_scan/reverse_range_scan ops and throughput issued at state store layer, and shared_buffer hit counts.
2. Get/range_scan/reverse_range_scan duration, from state store layer.
3. SSTable block IO durations (fetch from remote, put to remote)
4. SSTable block requests/second (read, block cache hit, fetch remote, put remote)
5. SSTable ops (bloom filter true negative counts, possible positive counts, # tableiterators of merge iterators)
6. MergeIterator ops (next, seek), duration (seek, next).
@skyzh
Copy link
Contributor

skyzh commented Mar 21, 2022

Why do we need MergeIterator ops (next, seek), duration (seek, next)? Isn't UserIterator's metrics enough?

If you intend to add metrics to MergeIterator, what about ConcatIterator, ReverseMergeIterator, and ReverseConcatIterator?

Copy link
Contributor

@wcy-fdu wcy-fdu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the rest!

@@ -56,14 +58,21 @@ pub struct MergeIteratorInner<'a, const DIRECTION: usize> {

/// The heap for merge sort.
heap: BinaryHeap<Node<'a, DIRECTION>>,

/// Statistics.
stats: Option<Arc<StateStoreMetrics>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if remove Option, and pass StateStoreMetrics::unused() in unit tests?
Now all these metrics structs implemented a interface xxx::unused() for unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if remove Option, and pass StateStoreMetrics::unused() in unit tests? Now all these metrics structs implemented a interface xxx::unused() for unit tests.

Sounds good!

@@ -53,7 +53,7 @@ mod test {
.map(|x| Box::new(x) as BoxedHummockIterator)
.collect_vec();

let mut mi = ReverseMergeIterator::new(iters);
let mut mi = ReverseMergeIterator::new(iters, None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to pass StateStoreMetrics::unused()

@twocode
Copy link
Contributor Author

twocode commented Mar 21, 2022

Why do we need MergeIterator ops (next, seek), duration (seek, next)? Isn't UserIterator's metrics enough?

No, get does not use UserIterator.

If you intend to add metrics to MergeIterator, what about ConcatIterator, ReverseMergeIterator, and ReverseConcatIterator?

MI is special because it backs all operations (get, scan, iter) and combines all table iterators, and it operates on local version manager directly so it has a view point of current LSM tree shape. ConcatIterator is just a glue of table iterators and trivial to monitor (except the true positive rate of bloom filter).

@skyzh
Copy link
Contributor

skyzh commented Mar 21, 2022

I'm afraid such metrics is too fine-grained for a storage system... I think they're better enabled only on perf instead of directly expose them to users. Every time series at Prometheus has overhead.

@@ -318,7 +319,7 @@ mod tests {
.map(|x| Box::new(x) as BoxedHummockIterator)
.collect_vec();

let mi = MergeIterator::new(iters);
let mi = MergeIterator::new(iters, Arc::new(StateStoreMetrics::unused()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend having a MergeIterator::new_for_test(iters);

@twocode
Copy link
Contributor Author

twocode commented Mar 21, 2022

I'm afraid such metrics is too fine-grained for a storage system... I think they're better enabled only on perf instead of directly expose them to users. Every time series at Prometheus has overhead.

These metrics are added because I find them quite basic to be missing when I want to have a clue about some streaming TPCH workload. I also don't want the metrics to be too granular to maintain and impact performance.

@twocode twocode merged commit b6f468a into main Mar 21, 2022
@twocode twocode deleted the xiangyhu/readtelem branch March 21, 2022 04:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants