-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create new "health check" method #2809
Comments
Speaking as an engineer, not an end-user, I think that reporting the values of the individual factors along with the health score could be valuable.
|
It might be useful to split this up into "liveness" and "readiness" (see https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/ for example) and I'm not convinced that a server that has constant 6 seconds lag is "healthy". |
Agree with Ed's point. I think this also pushes operators to understand what metrics are important to monitor. If we make this compatible with crawl, then it would be easy for anyone to scrape this data to build dashboards and monitoring of the network, which would be great. |
If you want people to build dashboards, implement https://openmetrics.io/ instead of the current statsd-only metrics framework. I currently collect metrics via |
Lets discuss this as part of 1.6. I added a tag to this issue. @mayurbhandary |
* Gives a summary of the health of the node: Healthy, Warning, or Critical * Last validated ledger age: <7s is Healthy, 7s to 20s is Warning > 20s is Critcal * If amendment blocked, Critical * Number of peers: > 7 is Healthy 1 to 7 is Warning 0 is Critical * server state: One of full, validating or proposing is Healthy One of syncing, tracking or connected is Warning All other states are Critical * load factor: <= 100 is Healthy 101 to 999 is Warning >= 1000 is Critical * If not Healthy, info field contains data that is considered not Healthy. Fixes: XRPLF#2809
* Gives a summary of the health of the node: Healthy, Warning, or Critical * Last validated ledger age: <7s is Healthy, 7s to 20s is Warning > 20s is Critcal * If amendment blocked, Critical * Number of peers: > 7 is Healthy 1 to 7 is Warning 0 is Critical * server state: One of full, validating or proposing is Healthy One of syncing, tracking or connected is Warning All other states are Critical * load factor: <= 100 is Healthy 101 to 999 is Warning >= 1000 is Critical * If not Healthy, info field contains data that is considered not Healthy. Fixes: XRPLF#2809
* Gives a summary of the health of the node: Healthy, Warning, or Critical * Last validated ledger age: <7s is Healthy, 7s to 20s is Warning > 20s is Critcal * If amendment blocked, Critical * Number of peers: > 7 is Healthy 1 to 7 is Warning 0 is Critical * server state: One of full, validating or proposing is Healthy One of syncing, tracking or connected is Warning All other states are Critical * load factor: <= 100 is Healthy 101 to 999 is Warning >= 1000 is Critical * If not Healthy, info field contains data that is considered not Healthy. Fixes: XRPLF#2809
The
server_info
command is very cluttered and in many cases it's not easy to diagnose whether the server is healthy or not from that. A simple "health check" method could make it easier to monitorrippled
with industry-standard tooling and also easier to diagnose manually as well.Ideally:
peer_private
), or 0 (critical)server_state
isfull
/validating
/proposing
(healthy),syncing
/tracking
/connected
(warning), ordisconnected
(critical)load_factor
based warning, too? Not sure what thresholds to use there.The text was updated successfully, but these errors were encountered: