batch: Add metrics for tasks. #3832

liurenjie1024 · 2022-07-13T06:57:16Z

Add task level metrics, including incoming number of rows and data size of each exchange source, outgoing number of rows and data size of each output.

BowenXiao1999 · 2022-08-08T09:37:41Z

My notes for this part:

We can collect metrics for batch executor like StreamingMetrics in streaming_stats.rs.

The metrics "exchange_recv_size" and "exchange_frag_recv_size" can be a good ref for collecting the input data rows of task and out data rows of task (accumulation).
Their prs: #3696

But a seems like we can not get the input recv size for fragment contains table scan: They do not have exchange source as input.

cc @ZENOTME

ZENOTME · 2022-08-17T05:06:30Z

The problem with this registry is that we need to clean up the registry after query finished.
There are two place we should clean up:

metrics in cn
metrics in prometheus server

for clean up job, I think we can implement it using 'range search', such as: batch_exchange_recv_row_number{query_id="aaa"} can get all item which query_id equal "aaa".
for metrics in prometheus server, it supports 'range search'.
for metrics in cn, it can't support 'range search'.

So I think we can clean metrics in prometheus first.
Seems complicated to implement clean metrics in cn, we need to record all {queryID, source_stage_id, target_stage_id, source_task_id, target_task_id}. @BowenXiao1999 @liurenjie1024

BowenXiao1999 · 2022-08-17T05:37:52Z

WDYM by metrics in cn? I think they are all metrics in prometheus

ZENOTME · 2022-08-17T05:47:43Z

WDYM by metrics in cn? I think they are all metrics in prometheus

I think metrics in cn is a local data structure like this:

pub struct BatchMetrics {
    pub row_seq_scan_next_duration: Histogram,
    pub exchange_recv_row_number: GenericCounterVec<AtomicU64>,
}

And we will send this metrics to prometheus server and then store at the prometheus. (It's metrics in prometheus server?

skyzh · 2022-08-17T05:58:41Z

Metrics are stored in Prometheus server, but they are not sent. Prometheus will pull metrics from each compute node.

skyzh · 2022-08-17T05:59:12Z

What's the difference between metrics in CN and metrics in Prometheus?

BowenXiao1999 · 2022-08-17T06:08:23Z

WDYM by metrics in cn? I think they are all metrics in prometheus

I think metrics in cn is a local data structure like this:
pub struct BatchMetrics {
    pub row_seq_scan_next_duration: Histogram,
    pub exchange_recv_row_number: GenericCounterVec<AtomicU64>,
}
And we will send this metrics to prometheus server and then store at the prometheus. (It's metrics in prometheus server?

I think call .delete_label_values will delete all? Metrics in CN and in Prometheus should both be take cared by the lib, and user do not need to care the detail/cache.

ZENOTME · 2022-08-17T06:28:28Z

What's the difference between metrics in CN and metrics in Prometheus?

For example there is a metrics in CN, like this: , (or maybe I should call it a value in prometheus client

#[derive(Debug)]
pub(crate) struct Metric {
   value:u64,
}

Metrics are stored in Prometheus server, but they are not sent. Prometheus will pull metrics from each compute node.

And as you say, and this metrics(value) will pull by prometheus and store in prometheus server.

The metrics in prometheus is pull from the metrics in CN.

I think call .delete_label_values will delete all? Metrics in CN and in Prometheus should both be take cared by the lib, and user do not need to care the detail/cache.

I look up the implementation and find delete_label_values will only delete the metrics in CN. (children.remove(&h)
I'm not sure it will sync with metrics in Prometheus.

 pub fn delete(&self, labels: &HashMap<&str, &str>) -> Result<()> {
        let h = self.hash_labels(labels)?;

        let mut children = self.children.write();
        if children.remove(&h).is_none() {   <--------------------------------children is Hash<u64,T>
            return Err(Error::Msg(format!("missing labels {:?}", labels)));
        }

        Ok(())
    }

liurenjie1024 · 2022-08-17T06:54:57Z

We don't need to care about deleting metrics in promethues, it will delete by promethues server after some preconfigured interval.
For deleting task level metrics, here is the changes:
a. Maintain one BatchMetrics for each BatchTaskExecution
b. When a batch task finished/aborted, move it to a deletion queue, which executes deletion of metrics after several minutes. It's important not to delete it immediately since promethues pull data periodically.

BowenXiao1999 · 2022-09-22T05:12:02Z

Closed

liurenjie1024 mentioned this issue Jul 13, 2022

batch: Improve distributed query engine. #1977

Closed

18 tasks

liurenjie1024 changed the title ~~Add metrics for tasks.~~ batch: Add metrics for tasks. Jul 13, 2022

liurenjie1024 added type/feature component/batch Batch related related issue. labels Jul 13, 2022

ZENOTME mentioned this issue Aug 11, 2022

feat(batch):add metrics for batch exchange executor #4577

Merged

3 tasks

ZENOTME mentioned this issue Aug 19, 2022

feat(batch): add task level metrics delete #4757

Merged

3 tasks

BowenXiao1999 closed this as completed Sep 22, 2022

Gun9niR mentioned this issue Oct 10, 2022

BatchTaskMetrics uses unbounded memory #5743

Closed

MrCroxx mentioned this issue May 10, 2023

Per actor metrics: should be cleaned when the actor is dropped or moved. #9492

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch: Add metrics for tasks. #3832

batch: Add metrics for tasks. #3832

liurenjie1024 commented Jul 13, 2022 •

edited

Loading

BowenXiao1999 commented Aug 8, 2022 •

edited

Loading

ZENOTME commented Aug 17, 2022

BowenXiao1999 commented Aug 17, 2022

ZENOTME commented Aug 17, 2022 •

edited

Loading

skyzh commented Aug 17, 2022

skyzh commented Aug 17, 2022

BowenXiao1999 commented Aug 17, 2022

ZENOTME commented Aug 17, 2022 •

edited

Loading

liurenjie1024 commented Aug 17, 2022

BowenXiao1999 commented Sep 22, 2022

batch: Add metrics for tasks. #3832

batch: Add metrics for tasks. #3832

Comments

liurenjie1024 commented Jul 13, 2022 • edited Loading

BowenXiao1999 commented Aug 8, 2022 • edited Loading

ZENOTME commented Aug 17, 2022

BowenXiao1999 commented Aug 17, 2022

ZENOTME commented Aug 17, 2022 • edited Loading

skyzh commented Aug 17, 2022

skyzh commented Aug 17, 2022

BowenXiao1999 commented Aug 17, 2022

ZENOTME commented Aug 17, 2022 • edited Loading

liurenjie1024 commented Aug 17, 2022

BowenXiao1999 commented Sep 22, 2022

liurenjie1024 commented Jul 13, 2022 •

edited

Loading

BowenXiao1999 commented Aug 8, 2022 •

edited

Loading

ZENOTME commented Aug 17, 2022 •

edited

Loading

ZENOTME commented Aug 17, 2022 •

edited

Loading