More metrics for the EL monitor #4707

zah · 2023-03-09T14:34:51Z

engine_api_response_time provides a histogram for the Engine API response times for each unique pair ot URL and request type.
All engine API requests are now tracked

Other changes:

The client will no longer exit on start-up if it fails to connect to a properly configured EL node.

* `engine_api_response_time` provides a histogram for the Engine API response times for each unique pair ot URL and request type. * All engine API requests are now tracked Other changes: The client will no longer exit on start-up if it fails to connect to a properly configured EL node.

jakubgs · 2023-03-09T16:13:06Z

Looking good:

[email protected]:~ % c http://10.14.0.70:9202/metrics | grep engine_api_response_time | grep newPayload
engine_api_response_time_sum{url="http://localhost:8554",request="newPayload"} 6.565
engine_api_response_time_count{url="http://localhost:8554",request="newPayload"} 65.0
engine_api_response_time_created{url="http://localhost:8554",request="newPayload"} 1678377338.0
engine_api_response_time_bucket{url="http://localhost:8554",request="newPayload",le="0.1"} 36.0
engine_api_response_time_bucket{url="http://localhost:8554",request="newPayload",le="0.25"} 65.0
engine_api_response_time_bucket{url="http://localhost:8554",request="newPayload",le="0.5"} 65.0
engine_api_response_time_bucket{url="http://localhost:8554",request="newPayload",le="1.0"} 65.0
engine_api_response_time_bucket{url="http://localhost:8554",request="newPayload",le="2.0"} 65.0
engine_api_response_time_bucket{url="http://localhost:8554",request="newPayload",le="5.0"} 65.0
engine_api_response_time_bucket{url="http://localhost:8554",request="newPayload",le="+Inf"} 65.0

jakubgs · 2023-03-09T16:21:43Z

Seems to work, but I'm not getting values on the vertical scale.Weird.

jakubgs · 2023-03-09T16:24:26Z

beacon_chain/eth1/eth1_monitor.nim

@@ -259,6 +259,11 @@ declareCounter engine_api_responses,
  "Number of successful requests to the newPayload Engine API end-point",
  labels = ["url", "request", "status"]

+declareHistogram engine_api_response_time,


A more conventional naming for histograms is duration, not time. Also the name should include units used.
So maybe more like engine_api_request_duration_seconds.

...should have a suffix describing the unit, in plural form. Note that an accumulating count has total as a suffix, in addition to the unit if applicable.

http_request_duration_seconds

node_memory_usage_bytes

https://prometheus.io/docs/practices/naming/#metric-names

jakubgs · 2023-03-09T16:26:46Z

beacon_chain/eth1/eth1_monitor.nim

@@ -259,6 +259,11 @@ declareCounter engine_api_responses,
  "Number of successful requests to the newPayload Engine API end-point",
  labels = ["url", "request", "status"]

+declareHistogram engine_api_response_time,
+  "Time(s) used to generate signature usign remote signer",
+   buckets = [0.1, 0.25, 0.5, 1.0, 2.0, 5.0],


Could we maybe have a bit more fine grained buckets? Here's an example from Cortex:

[email protected]:~ % c 0:9092/metrics | grep cortex_request_duration_seconds_bucket | grep ingester_ring cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="0.005"} 2 cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="0.01"} 2 cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="0.025"} 2 cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="0.05"} 2 cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="0.1"} 2 cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="0.25"} 2 cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="0.5"} 2 cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="1"} 2 cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="2.5"} 2 cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="5"} 2 cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="10"} 2 cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="25"} 2 cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="50"} 2 cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="100"} 2 cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="+Inf"} 2

jakubgs · 2023-03-09T16:43:02Z

Ok, I see what was wrong. The Legend has to show only the le:

And now it looks correct:

But yeah, I'd appreciate some more granular buckets.

zah · 2023-03-09T17:30:49Z

The suggested changes have been pushed to unstable in e808fda. The new metric name is engine_api_request_duration_seconds

jakubgs · 2023-03-09T18:48:21Z

Broken tho:

/data/beacon-node-prater-unstable/repo/beacon_chain/eth1/eth1_monitor.nim(304, 7) Error: undeclared identifier: 'engine_api_response_time'
candidates (edit distance, scope distance); see '--spellSuggest':
 (4, 6): 'engine_api_responses' [var declared in /data/beacon-node-prater-unstable/repo/beacon_chain/eth1/eth1_monitor.nim(258, 16)]

jakubgs · 2023-03-09T23:39:05Z

Pretty cool:

`linux-04.he-eu-hel1.nimbus.prater`

`linux-05.he-eu-hel1.nimbus.prater`

`linux-06.he-eu-hel1.nimbus.prater`

Here we can clearly see that the more validators you have the more strained the EL node is.

jakubgs · 2023-03-09T23:41:21Z

Interesting jump in getBlockByNumber:

https://metrics.status.im/d/pgeNfj2Wz23/nimbus-fleet-testnets?orgId=1&refresh=5m&var-instance=linux-05.he-eu-hel1.nimbus.prater&var-container=beacon-node-prater-unstable&from=1678394472978&to=1678405272978

zah · 2023-03-10T22:10:30Z

That's expected. The earlier version of the code didn't track all requests.

Since #4465, compilation with `-d:has_deposit_root_checks` fails. #4707 further built on top of it but the additions also don't compile. Fix it.

zah enabled auto-merge (squash) March 9, 2023 14:35

zah mentioned this pull request Mar 9, 2023

Metrics for Web3 request responses #3950

Closed

jakubgs reviewed Mar 9, 2023

View reviewed changes

zah disabled auto-merge March 9, 2023 17:28

zah merged commit ef20e83 into unstable Mar 9, 2023

zah deleted the el-manager-tweaks-2023-03-09 branch March 9, 2023 17:29

etan-status added a commit that referenced this pull request Feb 6, 2024

fix compilation with -d:has_deposit_root_checks

7c18243

Since #4465, compilation with `-d:has_deposit_root_checks` fails. #4707 further built on top of it but the additions also don't compile. Fix it.

etan-status mentioned this pull request Feb 6, 2024

fix compilation with -d:has_deposit_root_checks #5855

Merged

etan-status added a commit that referenced this pull request Feb 6, 2024

fix compilation with -d:has_deposit_root_checks (#5855)

f0f14f1

Since #4465, compilation with `-d:has_deposit_root_checks` fails. #4707 further built on top of it but the additions also don't compile. Fix it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More metrics for the EL monitor #4707

More metrics for the EL monitor #4707

zah commented Mar 9, 2023

jakubgs commented Mar 9, 2023

jakubgs commented Mar 9, 2023

jakubgs Mar 9, 2023

jakubgs Mar 9, 2023

jakubgs Mar 9, 2023

jakubgs commented Mar 9, 2023

zah commented Mar 9, 2023 •

edited

Loading

jakubgs commented Mar 9, 2023

jakubgs commented Mar 9, 2023

jakubgs commented Mar 9, 2023

zah commented Mar 10, 2023

More metrics for the EL monitor #4707

More metrics for the EL monitor #4707

Conversation

zah commented Mar 9, 2023

jakubgs commented Mar 9, 2023

jakubgs commented Mar 9, 2023

jakubgs Mar 9, 2023

Choose a reason for hiding this comment

jakubgs Mar 9, 2023

Choose a reason for hiding this comment

jakubgs Mar 9, 2023

Choose a reason for hiding this comment

jakubgs commented Mar 9, 2023

zah commented Mar 9, 2023 • edited Loading

jakubgs commented Mar 9, 2023

jakubgs commented Mar 9, 2023

linux-04.he-eu-hel1.nimbus.prater

linux-05.he-eu-hel1.nimbus.prater

linux-06.he-eu-hel1.nimbus.prater

jakubgs commented Mar 9, 2023

zah commented Mar 10, 2023

zah commented Mar 9, 2023 •

edited

Loading

`linux-04.he-eu-hel1.nimbus.prater`

`linux-05.he-eu-hel1.nimbus.prater`

`linux-06.he-eu-hel1.nimbus.prater`