Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign Consensus Simulation Framework (RIPD-1361) #2209

Closed
wants to merge 1 commit into from

Conversation

bachase
Copy link
Collaborator

@bachase bachase commented Aug 18, 2017

This PR contains several changesets related to a redesigned consensus simulation framework (CSF) and reflects joint work between myself and @jmmcgee. Some highlights of the latest iteration of the framework:

  • Adds an event/collector framework for monitoring invariants and doing calculating statistics.
  • Allows distinct network and trust connections between Peers
  • Adds a simple routing strategy to support broadcasting arbitrary messages
  • Adds a common directed graph (Digraph) class for representing network and trust topologies
  • Adds a PeerGroup class for simpler specification of the trust and network topologies
  • Adds a LedgerOracle class to ensure distinct ledger histories and simplify branch checking
  • Adds a Submitter to send transactions in at fixed or random intervals to fixed or random peers

The framework design is still in its early stages and will continue to be reworked in support of ongoing simulation studies.

For reviewers, there are two main requests

  1. Ensure the changes to generic consensus in [FOLD] Prepare for CSF redesign: 1ca12e1 are satisfactory. The changes are non-behavior changing.
  2. Provide feedback on the design of the CSF. This is probably best done by reviewing the flattened PR and focusing on the src/test/csf directory. Initially, reviewing the brief README may provide better context for the changes.

@bachase
Copy link
Collaborator Author

bachase commented Aug 18, 2017

@wilsonianb and @scottschurr I've tagged you both and would request you focus on the prepartion commit [FOLD] Prepare for CSF redesign: 1ca12e1. Reviewing the larger redesign is up to you; I'll solicit volunteers for that larger task.

@codecov-io
Copy link

codecov-io commented Aug 18, 2017

Codecov Report

Merging #2209 into develop will increase coverage by 0.01%.
The diff coverage is 90%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2209      +/-   ##
===========================================
+ Coverage    70.22%   70.23%   +0.01%     
===========================================
  Files          692      692              
  Lines        50961    50977      +16     
===========================================
+ Hits         35787    35804      +17     
+ Misses       15174    15173       -1
Impacted Files Coverage Δ
src/ripple/basics/tagged_integer.h 94.73% <ø> (-5.27%) ⬇️
src/ripple/app/consensus/RCLCxLedger.h 80% <ø> (ø) ⬆️
src/ripple/app/consensus/RCLConsensus.h 80% <ø> (ø) ⬆️
src/ripple/consensus/LedgerTiming.h 92.3% <100%> (-3.53%) ⬇️
src/ripple/consensus/Validations.h 98.47% <100%> (+0.09%) ⬆️
src/ripple/consensus/DisputedTx.h 97.26% <100%> (+0.03%) ⬆️
src/ripple/app/consensus/RCLConsensus.cpp 66.74% <25%> (+0.97%) ⬆️
src/ripple/consensus/Consensus.h 88.86% <90%> (-0.23%) ⬇️
src/ripple/basics/DecayingSample.h 77.77% <0%> (-8.34%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6dc79c2...ad62c13. Read the comment docs.

std::chrono::milliseconds delay,
Generator &g)
{
std::size_t const size = peers.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: unused variable 'size' [-Wunused-variable]

SizePDF unlSizePDF,
Generator& g)
{
std::size_t const size = peers.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: unused variable 'size' [-Wunused-variable]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See earlier assert(peers.size() == ranks.size()); suggestion. Also note there are steps 2 and 3, but not 1.

/** Determine the preferred ledger based on its support

@param current The current ledger the node follows
@param ledgerCounts Ledger IDs and corresponding counts of support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ledgerCounts --> dist ?

// Or stick with ours on a tie
if ((it.second > netLgrCount) ||
((it.second == netLgrCount) &&
((it.first == current) || (it.first > netLgr))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like (it.first == current) || (it.first > netLgr) is new (previously just (it.first == current)).
Should it instead be (it.first == current) || ((it.first > netLgr) && (netLgr != current))?
As is, I think you can get a netLgr of either current or a sequence later than current (assuming they have the same count) depending of which LedgerID appears later in the hash_map.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I agree with your suggested change.

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing only the [FOLD] Prepare for CSF redesign commit. I left some comments and thoughts, but saw no particular problems. Looks good. 👍

@@ -458,10 +458,12 @@ struct Ledger
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document contains a few references to the "Ripple network" and "Ripple Consensus Ledger". I'm wondering if those should be changed to "XRP Ledger"? Possibly a marketing or @mDuo13 question...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The references to "Ripple network" and "Ripple Consensus Ledger" are still in the document. @mDuo13, is it important to address this? Or should we let it slide for now?

// Or stick with ours on a tie
if ((it.second > netLgrCount) ||
((it.second == netLgrCount) &&
((it.first == current) || (it.first > netLgr))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if condition is slightly different from the original inline code it replaces in RCLConsensus.cpp. I'm unsure whether that difference is intentional, so I thought I'd just call it out. The replacement is not quite a simple refactor...

@@ -147,7 +147,7 @@ RCLConsensus::Adaptor::relay(RCLCxPeerPos const& peerPos)
}

void
RCLConsensus::Adaptor::relay(RCLCxTx const& tx)
RCLConsensus::Adaptor::share(RCLCxTx const& tx)
{
// If we didn't relay this transaction recently, relay it to all peers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still want to say "relay" in this comment? Or should it say "share"?

Also, there are a few uses of "relay" in RCLValidations.cpp and one in RCLValidations.h. Do you want to change those to "share" as well?

I don't feel strongly on this, just asking...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with share in the generic context and left relay in the ripple specific implementation (RCL*). The thought is that the generic version can use a variety of routing strategies to send messages in support of the consensus algorithm, so relay was too specific. However, since rippled's overlay and hashrouter use relay, I stuck with that name.

I agree that having two names is not ideal, but I felt the generic vs concrete boundary was an acceptable spot to diverge.

@@ -37,6 +37,8 @@ class RCLCxLedger
public:
//! Unique identifier of a ledger
using ID = LedgerHash;
//! Sequence number of a ledger
using Seq = LedgerIndex;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you eventually intend to change LedgerMaster to use Seq instead of LedgerIndex? Not insisting, just curious.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially. LedgerMaster and NetworkOps still have functionality that is related to consensus that should arguably be moved to the generic implementation and that might warrant the change.

@@ -55,28 +57,28 @@ class RCLCxLedger
}

//! Sequence number of the ledger.
auto const&
Seq const&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for explicit return types!

@@ -874,7 +875,7 @@ Consensus<Adaptor>::getJson(bool full) const
if (mode_.get() != ConsensusMode::wrongLedger)
{
ret["synched"] = true;
ret["ledger_seq"] = previousLedger_.seq() + 1;
ret["ledger_seq"] = static_cast<std::uint32_t>(previousLedger_.seq())+ 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing a couple of these static_cast<>s. That's fine, but you might consider adding a method named, say, seqAsUInt() to your Ledger_t implementations. That would keep the static_cast<> inside the class responsible for supplying the data. Just a thought.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave as is for now since it is just for JSON/debug support. I intend (eventually) to revisit how that is done in the generic code.

@@ -192,6 +222,7 @@ class Validations
decay_result_t<decltype (&Validation::ledgerID)(Validation)>;
using NodeKey = decay_result_t<decltype (&Validation::key)(Validation)>;
using NodeID = decay_result_t<decltype (&Validation::nodeID)(Validation)>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RCLConsensus.h is using NodeID_t instead of NodeID. Is it worth making that consistent? So, for example SeqType would be Seq_t? Just for your consideration. Personally, I really like it when a given type retains the same name for most or all of the code base. That makes it way easier to grep for instances. It's nice when it works that way, but it's hard to enforce.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I should revisit this. Two follow-ups

  1. Is the decay_result_t style acceptable? Or do you prefer the explicit using style?
  2. If its the latter, I added the _t suffix because NodeID is an existing ripple type, so wasn't sure how to define an alias using NodeID = NodeID?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm completely fine with the decay_result_t style. If you want to use the explicit using style, consider

using NodeID = ripple::NodeID;

I believe you can also use

using NodeID = NodeID;

but you'll leave your readers a bit puzzled about what you're up to...

using TxSet_t = RCLTxSet;
using PeerPosition_t = RCLCxPeerPos;
using Ledger = RCLCxLedger;
using NodeID = ripple::NodeID;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error: declaration of 'using NodeID = using NodeID = class ripple::base_uint<160ul, ripple::detail::NodeIDTag>' [-fpermissive]
         using NodeID = ripple::NodeID;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh. Windows was happy at least.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bummer. Clang on macOS was okay with it too. Sorry for the misdirection.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I'll just drop that change for now and revisit separately.

@bachase bachase force-pushed the consensus-sim-flatten branch from 51098cf to ea0fe89 Compare August 23, 2017 18:16
@bachase
Copy link
Collaborator Author

bachase commented Aug 23, 2017

Force pushed and dropped 51098cf

@bachase
Copy link
Collaborator Author

bachase commented Aug 29, 2017

The last commit fixes a bug in the simulation framework uncovered in other simulations that are not part of this review.

@bachase bachase force-pushed the consensus-sim-flatten branch from 1d946c3 to 53d4bda Compare September 5, 2017 13:41
@bachase
Copy link
Collaborator Author

bachase commented Sep 5, 2017

Rebased on 0.80.0-rc2

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this review took so long. There is a lot of code here! I'm not convinced my particular review really brought much of value, but I noted a few things you might choose to look at. Interesting stuff.

describing, running and analyzing simulations of the consensus algorithm in a
controlled manner. It is also used to unit test the generic Ripple consensus
algorithm implementation. The framework is in its early stages, so the design
and supported features is subject to change.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micro nit: is -> are?

each `Peer` has closed one additional ledger. After closing the additional
ledger, the `Peer` stops participating in consensus. The first call is used to
ensure a more useful prior state of all `Peer`s. After the transaction
submission, the second call to `run` results in additional ledger that accepts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sub-nit: in "one" or "an" additional...?

@@ -63,14 +63,14 @@ auto constexpr decreaseLedgerTimeResolutionEvery = 1;
@pre previousResolution must be a valid bin
from @ref ledgerPossibleTimeResolutions
*/
template <class duration>
template <class duration, class seq>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually expect types to have initial caps in our code base. So I'd expect to see class Duration, class Seq. I think that change might make the code inside the body a bit easier to read. Just a thought, not a requirement.

{
assert(ledgerSeq);
assert(ledgerSeq != seq{0});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requirements on type seq are not very clear. It looks like it should be integral, and ledgerSeq certainly suggests it should be integral. But if, right here, I insert...

static_assert (std::is_integral<seq>::value, "");

The static_assert fails from an instantiation based here: src/test/csf/Peer.h:818

consensus.timerEntry(now());

Maybe this is expected? Would like to know what you think...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is basically integral, but it also supports tagged_integer types which aren't exactly integral. I could try adding a concept check if you find that useful, but will at least add some comments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments will be good.

//------------------------------------------------------------------------------
/*
This file is part of rippled: https://github.com/ripple/rippled
Copyright (c) 2012-2017 Ripple Labs Inc->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the "Inc" supposed to end with a period? I think a global replace went awry.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes. Sorry and thanks for catching the aftermath!

//! When the ledger closed (up to closeTimeResolution)
NetClock::time_point closeTime;

//! Whether consenssus agreed on the close time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consenssus -> consensus?

return res;
}

/** Invokable that returns random samples from a range according to a discrete
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that C++ is drifting toward the spelling "invocable" instead of "invokable".

namespace test {
namespace csf {

// Submitters are class for simulating submission of transactions to the network
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class -> classes?

to succesive calls of distribution(), until stop.

@tparam Distribution is a `UniformRandomBitGenerator` from the STL that
is usesd by random distributions to generate random samples
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usesd -> used?


@tparam Distribution is a `UniformRandomBitGenerator` from the STL that
is usesd by random distributions to generate random samples
@tparam Generator is a object with member
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a -> an?

Copy link
Contributor

@HowardHinnant HowardHinnant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the last commit in detail, and tested out the entire P/R. All looks good!

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagged a few more nits, but nothing that needs to be addressed. 👍

@@ -458,10 +458,12 @@ struct Ledger
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The references to "Ripple network" and "Ripple Consensus Ledger" are still in the document. @mDuo13, is it important to address this? Or should we let it slide for now?

@@ -62,15 +61,22 @@ auto constexpr decreaseLedgerTimeResolutionEvery = 1;

@pre previousResolution must be a valid bin
from @ref ledgerPossibleTimeResolutions

@tparam Rep Type representing number of ticks in std::chrono::duration
@tparam Period An std::ratioe representing tick period in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ratioe -> ratio?

std::chrono::duration
@tparam Seq Unsigned integer-like type corresponding to the ledger sequence
number. It should be comparable to 0 and support modular
division. Built-in and tagged_integers are supported.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That last sentence really helps. Thanks.

// an extra non-consensus transaction).

Sim sim;
ConsensusParms parms;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this should be const?


PeerGroup network = loner + clique;
network.connect(
network, round<milliseconds>(0.2 * parms.ledgerGRANULARITY));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not looking too closely, but I suspect the need for the round is because:

parts.ledgerGranularity == 1 sec
(double) 0.2 * 1 sec produces (double) 0.2 sec
round<milliseconds>(0.2 sec) produces 200 milliseconds

If that's correct you could probably get away from the round call by

  1. declaring parts.ledgerGRANULARITY as 1000 milliseconds (represented internally as 1000), and
  2. instead of multiply by 0.2, divide by 5, so you stay in the integer world.

Then I think you should automatically get 200 milliseconds without the round call.

/** Generate a vector of random samples

@param size the size of the sample
@param pdf the distribution to sample
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were considering changing pdf (Probability Distribution Function or Portable Document Format) to rnd (Random Number Distribution). I like rnd and RND much better than pdf and PDF. Is that change still in the cards?

argStream >> maxNumValidators;
argStream >> delayCount;

std::chrono::milliseconds delay(delayCount);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these three locals can still be const.


/** Directed graph

Basic directed graph that uses an adjacency lists to represent out edges.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lists -> list?

The clock is advanced to the time
of the last delivered event.

@return `true` if a event was processed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could still be changed to an event...

Copy link
Contributor

@mellery451 mellery451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments - LGTM

*/
template <class VertexName>
void
saveDot(std::ostream & out, VertexName&& vertexName) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not to over-engineer it, but couldn't this instead be defined as an ostream operator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure how to do that conveniently and allow determining how to represent the vertex name at the call-site> I could try wrapping instead, but this seemed good enough for now.

std::cout << "Fully synchronized: " << std::boolalpha
<< sim.synchronized() << "\n";
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this test doesn't actually have any assertions, so the test runner aborts at the end. Maybe add a pass() or some other trivial assertion ?

@bachase bachase force-pushed the consensus-sim-flatten branch from 8bc1c62 to 212bd2d Compare September 26, 2017 12:20
@bachase
Copy link
Collaborator Author

bachase commented Sep 26, 2017

Rebased on 0.80.0-rc3

@bachase bachase force-pushed the consensus-sim-flatten branch from 212bd2d to 59f28e6 Compare October 25, 2017 14:02
@bachase bachase added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Oct 25, 2017
@bachase
Copy link
Collaborator Author

bachase commented Oct 25, 2017

Rebased on 0.80.0, squashed and marking as passed.

@bachase bachase removed the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Nov 2, 2017
- Separate `Scheduler` from `BasicNetwork`.
- Add an event/collector framework for monitoring invariants and calculating statistics.
- Allow distinct network and trust connections between Peers.
- Add a simple routing strategy to support broadcasting arbitrary messages.
- Add a common directed graph (`Digraph`) class for representing network and trust topologies.
- Add a `PeerGroup` class for simpler specification of the trust and network topologies.
- Add a `LedgerOracle` class to ensure distinct ledger histories and simplify branch checking.
- Add a `Submitter` to send transactions in at fixed or random intervals to fixed or random peers.

Co-authored-by: Joseph McGee
@bachase bachase force-pushed the consensus-sim-flatten branch from 59f28e6 to ad62c13 Compare November 30, 2017 22:14
@bachase bachase added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Nov 30, 2017
@bachase
Copy link
Collaborator Author

bachase commented Nov 30, 2017

Rebased on 0.80.1

@seelabs
Copy link
Collaborator

seelabs commented Dec 1, 2017

In 0.90.0-b1

@seelabs seelabs closed this Dec 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants