-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify definition of "network health." #4729
Changes from all commits
73b0ecf
2237f1d
e7841e7
a5e12e0
8d60850
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -190,11 +190,27 @@ lower) fee to get into the same position as a reference transaction. | |
|
||
### Consensus Health | ||
|
||
For consensus to be considered healthy, the consensus process must take | ||
less than 5 seconds. This time limit was chosen based on observed past | ||
behavior of the network. Note that this is not necessarily the time between | ||
ledger closings, as consensus usually starts some amount of time after | ||
a ledger opens. | ||
For consensus to be considered healthy, the peers on the network | ||
should largely remain in sync with one another. It is particularly | ||
important for the validators to remain in sync, because that is required | ||
for participation in consensus. However, the network tolerates some | ||
validators being out of sync. Fundamentally, network health is a | ||
function of validators reaching consensus on sets of recently submitted | ||
transactions. | ||
|
||
Another factor to consider is | ||
the duration of the consensus process itself. This generally takes | ||
under 5 seconds on the main network under low volume. This is based on | ||
historical observations. However factors such as transaction volume | ||
can increase consensus duration. This is because rippled performs | ||
more work as transaction volume increases. Under sufficient load this | ||
tends to increase consensus duration. It's possible that relatively high | ||
consensus duration indicates a problem, but it is not appropriate to | ||
conclude so without investigation. The upper limit for consensus | ||
duration should be roughly 20 seconds. That is far above the normal. | ||
If the network takes this long to close ledgers, then it is almost | ||
certain that there is a problem with the network. This circumstance | ||
often coincides with new ledgers with zero transactions. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So the consensus process takes >20 seconds, although no transactions were included in the ledger. Can we list any factors that might cause this issue? Historically, have such problems occurred on the mainnet or other affiliated blockchain networks? Can we provide a link to such an example? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That goes beyond clarifying what stability is and gets into speculation and diagnostics. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel giving such examples would provide more clarity. As it stands, the reader does not understand why the network could become unstable |
||
|
||
### Other Constants | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the Reliability Score as a proxy for measuring the network health? It seems to indicate the degree of similarity in the calculations between validators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what that is, and it brings me to a site that asks for my email address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the link is wrong. Here's the reference: https://xrpl.org/negative-unl.html#reliability-measurement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ckeshava
The problem with the existing document is that it can be mis-interpreted to mean that 5s latency in consensus is some extreme upper limit, beyond which the network is in a faulty state. This PR corrects the language and hopefully encourages approaching the issue with some nuance. I didn't really intend this to be an exhaustive treatment of all the ways that the network can have problems, or different diagnostics and measurements that can be done. That is actually quite a sizable topic. But for now I prefer that this stays concise and mainly clarifies the original statement.