-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More metrics for the EL monitor #4707
Conversation
* `engine_api_response_time` provides a histogram for the Engine API response times for each unique pair ot URL and request type. * All engine API requests are now tracked Other changes: The client will no longer exit on start-up if it fails to connect to a properly configured EL node.
Looking good:
|
@@ -259,6 +259,11 @@ declareCounter engine_api_responses, | |||
"Number of successful requests to the newPayload Engine API end-point", | |||
labels = ["url", "request", "status"] | |||
|
|||
declareHistogram engine_api_response_time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more conventional naming for histograms is duration
, not time
. Also the name should include units used.
So maybe more like engine_api_request_duration_seconds
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...should have a suffix describing the unit, in plural form. Note that an accumulating count has total as a suffix, in addition to the unit if applicable.
http_request_duration_seconds
node_memory_usage_bytes
@@ -259,6 +259,11 @@ declareCounter engine_api_responses, | |||
"Number of successful requests to the newPayload Engine API end-point", | |||
labels = ["url", "request", "status"] | |||
|
|||
declareHistogram engine_api_response_time, | |||
"Time(s) used to generate signature usign remote signer", | |||
buckets = [0.1, 0.25, 0.5, 1.0, 2.0, 5.0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we maybe have a bit more fine grained buckets? Here's an example from Cortex:
[email protected]:~ % c 0:9092/metrics | grep cortex_request_duration_seconds_bucket | grep ingester_ring
cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="0.005"} 2
cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="0.01"} 2
cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="0.025"} 2
cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="0.05"} 2
cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="0.1"} 2
cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="0.25"} 2
cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="0.5"} 2
cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="1"} 2
cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="2.5"} 2
cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="5"} 2
cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="10"} 2
cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="25"} 2
cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="50"} 2
cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="100"} 2
cortex_request_duration_seconds_bucket{method="GET",route="ingester_ring",status_code="200",ws="false",le="+Inf"} 2
The suggested changes have been pushed to |
Broken tho:
|
That's expected. The earlier version of the code didn't track all requests. |
engine_api_response_time
provides a histogram for the Engine API response times for each unique pair ot URL and request type.All engine API requests are now tracked
Other changes:
The client will no longer exit on start-up if it fails to connect to a properly configured EL node.