Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make minimum quorum Byzantine fault tolerant (RIPD-1461) #2093

Closed
wants to merge 1 commit into from

Conversation

wilsonianb
Copy link
Contributor

No description provided.

@codecov-io
Copy link

codecov-io commented Apr 25, 2017

Codecov Report

Merging #2093 into develop will increase coverage by <.01%.
The diff coverage is 92.85%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2093      +/-   ##
===========================================
+ Coverage    69.48%   69.49%   +<.01%     
===========================================
  Files          685      685              
  Lines        50520    50520              
===========================================
+ Hits         35105    35108       +3     
+ Misses       15415    15412       -3
Impacted Files Coverage Δ
src/ripple/app/misc/impl/ValidatorList.cpp 83.41% <100%> (+0.08%) ⬆️
src/ripple/app/misc/ValidatorList.h 98.27% <87.5%> (+1.66%) ⬆️
src/ripple/app/ledger/PendingSaves.h 93.54% <0%> (-6.46%) ⬇️
src/ripple/core/impl/Workers.cpp 100% <0%> (+1.13%) ⬆️
src/ripple/basics/DecayingSample.h 86.11% <0%> (+8.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87742a5...5abccfd. Read the comment docs.

Copy link
Collaborator

@bachase bachase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly interested in clarification on f.

The following is only a suggestion, but I'd find it useful to add more comments. I'm imagining coming back some months in the future and having to tease out what is going on.

if (localPubKey_.size() && ! localKeyListed &&
rankedKeys.size () > 1 && keyListings_.size () % 2 != 0)
++quorum;
// This minimum quorum of 2f + 1 guarantees safe overlap with the trusted
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define f?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather we don't mention f at all here:

// The minimum quorum guarantees safety with up 1/3 of the listed
// validators being malicious.

@@ -309,7 +309,8 @@ class ValidatorList
std::function<void(PublicKey const&, bool)> func) const;

static std::size_t
calculateQuorum (std::size_t nTrustedKeys);
calculateMinimumQuorum (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding doxygen comments since calculateMinimumQuorum is public.

Copy link
Contributor Author

@wilsonianb wilsonianb Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OR... what do you think about calculateMinimumQuorum being private?
I'd remove testCalculateMinimumQuorum and move most of its checks down here:
6ef1f30#diff-03bfc5ce29ad8cf375088f420df2649dR780

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your call. If you are comfortable getting coverage via that other spot, seems fine to me.

// Use 80% for large values of n, but have special cases for small numbers.
constexpr std::array<std::size_t, 10> quorum{{ 0, 1, 2, 2, 3, 3, 4, 5, 6, 7 }};
// Use 2f + 1 for large values of n, but have special cases for small numbers.
constexpr std::array<std::size_t, 6> quorum{{ 0, 1, 2, 2, 3, 3 }};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the number of special cases change from 10 to 6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly because the special cases stopped being special. For 7, 8, and 9 the special quorum was the same as the new minimum. 6's special quorum is one less than the minimum, and David suggested we be conservative with numbers >5.

Copy link
Contributor

@nbougalis nbougalis Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is confusing... first, there's no n, and I see nothing about 2f+1 anywhere in the code, although I do see what looks like 2f/3 + 1. Also, it says we have special cases, but doesn't explain why. But I digress...

There's no reason for a table even, since n/2+1 gives the following (calc):

n quorum[n] n/2+1 OK?
0 0 1
1 1 1
2 2 2
3 2 2
4 3 3
5 3 3

With the exception of quorum[0] this matches perfectly with our "special cases". And I am curious as to why we return 0. That only seems possible in the case where nListedKeys == 0 && unlistedLocal == false, so what case is that? Standalone mode? Startup of the first validator of a fresh network?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That ideone calculation is doing i/2 + 1 as opposed to 2/3 * i + 1.
Are you suggesting that we calculate a 51% quorum for smaller values as opposed to using the std::array? I see how that would work.

Copy link
Contributor

@nbougalis nbougalis Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I edited my comment; I meant to say we don't even need the table - n/2+1 would work in the sense that it produces the same result as our existing "special case" table (except for the case of n==0) and makes it simpler to express what we're doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's possible to run with an empty [validators] config section and have zero listed validators.
There also don't appear to be listed validators when running in standalone mode, so I'll keep setting the quorum to zero in that case.

}
else
// Do not require 80% quorum for less than 10 trusted validators
if (size >= 10)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this 10 relate to the explicit sizes used for 6 listed keys in calculateMinimumQuorum?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also suggest changing if(size >=10) to if(rankedKeys.size() > 10). I know you set that in the line above, but those lines move apart in future refactorings, it would be best to make it clear the minimum 10 is relative to the rankedKeys size rather than say the listed key size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we just had the custom quorums (less than 80%) for <10 validators. Now there's also custom quorums (less than the normal minimum) for <=5 validators.

if (publisherLists_.size() == 1)
{
// Try to raise the quorum to at least 80% of the trusted set
std::size_t const targetQuorum = size - size / 5;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If size >= 10, is it possible for targetQuorum to be less than the minimum quorum?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The declaration and if should be collapsed to:

quorum = std::max(quorum, size - size / 5);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that can happen if rankedKeys.size() (the number of potentially trusted validators) is much lower than keyListings_.size() (the number of listed validators).
The check below makes sure we won't use the targetQuorum if that's the case.

if (localPubKey_.size() && ! localKeyListed &&
rankedKeys.size () > 1 && keyListings_.size () % 2 != 0)
++quorum;
// This minimum quorum of 2f + 1 guarantees safe overlap with the trusted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather we don't mention f at all here:

// The minimum quorum guarantees safety with up 1/3 of the listed
// validators being malicious.

if (publisherLists_.size() == 1)
{
// Try to raise the quorum to at least 80% of the trusted set
std::size_t const targetQuorum = size - size / 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The declaration and if should be collapsed to:

quorum = std::max(quorum, size - size / 5);

// Use 80% for large values of n, but have special cases for small numbers.
constexpr std::array<std::size_t, 10> quorum{{ 0, 1, 2, 2, 3, 3, 4, 5, 6, 7 }};
// Use 2f + 1 for large values of n, but have special cases for small numbers.
constexpr std::array<std::size_t, 6> quorum{{ 0, 1, 2, 2, 3, 3 }};
Copy link
Contributor

@nbougalis nbougalis Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is confusing... first, there's no n, and I see nothing about 2f+1 anywhere in the code, although I do see what looks like 2f/3 + 1. Also, it says we have special cases, but doesn't explain why. But I digress...

There's no reason for a table even, since n/2+1 gives the following (calc):

n quorum[n] n/2+1 OK?
0 0 1
1 1 1
2 2 2
3 2 2
4 3 3
5 3 3

With the exception of quorum[0] this matches perfectly with our "special cases". And I am curious as to why we return 0. That only seems possible in the case where nListedKeys == 0 && unlistedLocal == false, so what case is that? Standalone mode? Startup of the first validator of a fresh network?

Copy link
Collaborator

@bachase bachase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional questions for when ranked keys is much lower than listed keys.

// reduce the trusted set size so that the quorum represents
// at least 80%
size = quorum * 1.25;
// Use all eligible keys if there is only one trusted list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me why the single trusted list gets special status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more that for multiple trusted lists we are motivated to reduce the number of trusted validators so that we are only trusting those included on the most lists.

if (unlistedLocal)
++nListedKeys;

// Guarantee safety with up to 1/3 listed validators being malicious.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth documenting the reason for 1/3?

{
// Reduce the trusted set size so that the quorum represents
// at least 80%
size = quorum * 1.25;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is new with these changes, but can't size end up greater than rankedKeys_.size()?

For example, suppose there are

  • More than 1 publisher lists
  • 20 listed keys
  • 10 ranked keys
  • This node has an unlisted key

Then isn't the minimum quorum 14, but the calculated size = 14 * 1.25 = 17? If so, I think that would make for an infinite loop down on line 450.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size can be greater than rankedKeys_.size()
That loop should be fine in that case since it is iterating rankedKeys. The result is just that we end up with fewer trusted keys than size.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh, misread that loop. Thanks!

// Reduce the trusted set size so that the quorum represents
// at least 80%
size = quorum * 1.25;
}
}

if (minimumQuorum_ && (seenValidators.empty() ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question to the above, but is setting quorum = *minimumQuorum_ dangerous for the extreme revoked key case?

Suppose

  • More than 1 publisher list
  • 10 list keys
  • 7 ranked keys
  • This node has unlisted key
    Calculated minimum quorum is 8, so rankedKeys.size() < 8 would set quorum to minimumQuorum_, which might be 3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting quorum = *minimumQuorum_ removes the safety guarantees from calculateMinimumQuorum. The quorum will not be Byzantine fault tolerant and may simply be forkable (<51%).
The quorum command line option used to set minimumQuorum_ is advertised as such
https://github.com/ripple/rippled/blob/develop/src/ripple/app/main/Main.cpp#L231

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider logging at warning if quorum > minimumQuorum_. Two tangential questions: first, under what circumstances would we use --quorum and second, should the description of the command include a marker (e.g. "(EXPERT USE ONLY)")?

Copy link
Contributor Author

@wilsonianb wilsonianb Apr 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're not bumping the quorum as an unlisted validator if there are less than 5, we might not need --quorum.
https://github.com/ripple/rippled/pull/2093/files#diff-7f8c7a926e17debbef064bc5c2f572dbR420
Previously, it was impossible to start up a new set of validators (reset the altnet) without specifying --quorum.
Note that I think --quorum acted the same pre-dynamic unl. Like [validation_quorum], it was the absolute minimum quorum you were willing to allow, which could possibly be unsafe.

return quorum[nTrustedKeys];

return nTrustedKeys - nTrustedKeys / 5;
if (nListedKeys == 0)
Copy link
Contributor

@nbougalis nbougalis Apr 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This make me raise an eyebrow.

I'd be more inclined to simply remove the if and return 1 from the if below. I don't think anything would break and I know your tests suggest nothing does. @JoelKatz, @bachase, comments?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @wilsonianb noted that standalone mode operates with an empty list. In that case, it would never fully validate a ledger. I'm not sure if that is a problem or not. If anything, that might be better configured via the "EXPERT ONLY" minimumQuorum, and leave this returning 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried standalone mode with a quorum of 1 (instead of 0), and it closed ledgers without any issue.
https://ripple.com/build/stand-alone-mode/#advancing-ledgers-in-stand-alone-mode

Copy link
Collaborator

@bachase bachase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@JoelKatz
Copy link
Collaborator

👍

@wilsonianb wilsonianb added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Jul 11, 2017
@wilsonianb
Copy link
Contributor Author

Squashed and rebased on 0.80.0-b1
@nbougalis I think I addressed your comments, but let me know if I prematurely marked this as passed.

@seelabs
Copy link
Collaborator

seelabs commented Jul 20, 2017

In 0.80.0-b2

@seelabs seelabs closed this Jul 20, 2017
@wilsonianb wilsonianb deleted the bft-quorum branch May 15, 2018 22:14
@yassinesakri
Copy link

cool stuff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants