-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify definition of "network health." #4729
Conversation
src/ripple/app/misc/FeeEscalation.md
Outdated
For consensus to be considered healthy, the peers on the network | ||
should largely remain in sync with one another. It is particularly | ||
important for the validators to remain in sync, because they must | ||
be in sync to participate in consensus. Another factor to consider is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would avoid repeated "in sync" and instead use "... validators to remain in sync in order to participate in consensus."
src/ripple/app/misc/FeeEscalation.md
Outdated
observations. However, some factors, such as transactions volumes, | ||
can increase consensus duration. This is because rippled performs | ||
more work as transaction volume increases. Under sufficient load this | ||
tends to increase consensus duration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would add "time" here "Under sufficient load this time tends to ...", because we refer to "the time" in the sentence below which is now pushed far from the reference to consensus duration time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duration is more precise, though "time" as used here means duration among other things. So duration is more appropriate here and I changed other instances of "time" to reflect that.
@ximinez I'd like to cover this PR with you next week, please. There are some things about fees that could probably be clarified a bit about network health. Thanks for reviewing this. Let's not merge this until then. Also, @Bronek I'm doing something here that's generally not OK--making changes to a PR after it's submitted. But this is a small PR, just for documentation. Anyway, normally what we do is submit a PR once we think it's feature complete, and only make changes based on review suggestions. So, "do as I say not as I do," please. :-) |
I think most PRs are actually a little more flexible on this point - it's fine to make (justified) changes to a PR after it's submitted, but it does mean that the PR will generally need re-review/re-approval before merging. That is perfectly OK though. |
Given
I've set this PR to "draft" status to ensure it isn't merged until deemed ready. |
@Bronek @HowardHinnant @ximinez @intelliot I just refined the document a bit more, and fixed a typo. Please scan again. |
src/ripple/app/misc/FeeEscalation.md
Outdated
often coincides with new ledgers with zero transactions. | ||
A variety of factors contribute to consensus health. | ||
|
||
Note that this is not necessarily the duration between | ||
ledger closings, as consensus usually starts some amount of time after | ||
a ledger opens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this sentence be moved up ? It seems disconnected here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just removed the sentence entirely. It's not a useful detail here.
note: this PR has changed since ximinez's last review, so it needs a re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with this as it is, but I think it could be better with a little clarification of the last sentence.
duration should be roughly 20 seconds. That is far above the normal. | ||
If the network takes this long to close ledgers, then it is almost | ||
certain that there is a problem with the network. This circumstance | ||
often coincides with new ledgers with zero transactions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the consensus process takes >20 seconds, although no transactions were included in the ledger.
Can we list any factors that might cause this issue? Historically, have such problems occurred on the mainnet or other affiliated blockchain networks? Can we provide a link to such an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That goes beyond clarifying what stability is and gets into speculation and diagnostics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel giving such examples would provide more clarity. As it stands, the reader does not understand why the network could become unstable
should largely remain in sync with one another. It is particularly | ||
important for the validators to remain in sync, because that is required | ||
for participation in consensus. However, the network tolerates some | ||
validators being out of sync. Fundamentally, network health is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the Reliability Score as a proxy for measuring the network health? It seems to indicate the degree of similarity in the calculations between validators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what that is, and it brings me to a site that asks for my email address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the link is wrong. Here's the reference: https://xrpl.org/negative-unl.html#reliability-measurement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ckeshava
The problem with the existing document is that it can be mis-interpreted to mean that 5s latency in consensus is some extreme upper limit, beyond which the network is in a faulty state. This PR corrects the language and hopefully encourages approaching the issue with some nuance. I didn't really intend this to be an exhaustive treatment of all the ways that the network can have problems, or different diagnostics and measurements that can be done. That is actually quite a sizable topic. But for now I prefer that this stays concise and mainly clarifies the original statement.
@ckeshava for the ideas + open questions that you have, please feel free to open a new issue (or better - a PR with your proposed changes). They are likely outside the scope of this particular PR |
Update the documentation to describe network health with more nuance as well as context about related factors.
High Level Overview of Change
The existing documentation describes network health at a very high level with no nuance that reflects the reality. This update better defines network health as well as provides context about related factors.
Context of Change
Type of Change