-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redesign Consensus Simulation Framework (RIPD-1361) #2209
Conversation
@wilsonianb and @scottschurr I've tagged you both and would request you focus on the prepartion commit |
Codecov Report
@@ Coverage Diff @@
## develop #2209 +/- ##
===========================================
+ Coverage 70.22% 70.23% +0.01%
===========================================
Files 692 692
Lines 50961 50977 +16
===========================================
+ Hits 35787 35804 +17
+ Misses 15174 15173 -1
Continue to review full report at Codecov.
|
std::chrono::milliseconds delay, | ||
Generator &g) | ||
{ | ||
std::size_t const size = peers.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: unused variable 'size' [-Wunused-variable]
SizePDF unlSizePDF, | ||
Generator& g) | ||
{ | ||
std::size_t const size = peers.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: unused variable 'size' [-Wunused-variable]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See earlier assert(peers.size() == ranks.size());
suggestion. Also note there are steps 2 and 3, but not 1.
src/ripple/consensus/Validations.h
Outdated
/** Determine the preferred ledger based on its support | ||
|
||
@param current The current ledger the node follows | ||
@param ledgerCounts Ledger IDs and corresponding counts of support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ledgerCounts
--> dist
?
src/ripple/consensus/Validations.h
Outdated
// Or stick with ours on a tie | ||
if ((it.second > netLgrCount) || | ||
((it.second == netLgrCount) && | ||
((it.first == current) || (it.first > netLgr)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like (it.first == current) || (it.first > netLgr)
is new (previously just (it.first == current)
).
Should it instead be (it.first == current) || ((it.first > netLgr) && (netLgr != current))
?
As is, I think you can get a netLgr
of either current
or a sequence later than current (assuming they have the same count) depending of which LedgerID
appears later in the hash_map
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I agree with your suggested change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing only the [FOLD] Prepare for CSF redesign
commit. I left some comments and thoughts, but saw no particular problems. Looks good. 👍
@@ -458,10 +458,12 @@ struct Ledger | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The document contains a few references to the "Ripple network" and "Ripple Consensus Ledger". I'm wondering if those should be changed to "XRP Ledger"? Possibly a marketing or @mDuo13 question...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The references to "Ripple network" and "Ripple Consensus Ledger" are still in the document. @mDuo13, is it important to address this? Or should we let it slide for now?
src/ripple/consensus/Validations.h
Outdated
// Or stick with ours on a tie | ||
if ((it.second > netLgrCount) || | ||
((it.second == netLgrCount) && | ||
((it.first == current) || (it.first > netLgr)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
condition is slightly different from the original inline code it replaces in RCLConsensus.cpp. I'm unsure whether that difference is intentional, so I thought I'd just call it out. The replacement is not quite a simple refactor...
@@ -147,7 +147,7 @@ RCLConsensus::Adaptor::relay(RCLCxPeerPos const& peerPos) | |||
} | |||
|
|||
void | |||
RCLConsensus::Adaptor::relay(RCLCxTx const& tx) | |||
RCLConsensus::Adaptor::share(RCLCxTx const& tx) | |||
{ | |||
// If we didn't relay this transaction recently, relay it to all peers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still want to say "relay" in this comment? Or should it say "share"?
Also, there are a few uses of "relay" in RCLValidations.cpp and one in RCLValidations.h. Do you want to change those to "share" as well?
I don't feel strongly on this, just asking...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with share
in the generic context and left relay
in the ripple specific implementation (RCL*
). The thought is that the generic version can use a variety of routing strategies to send messages in support of the consensus algorithm, so relay
was too specific. However, since rippled
's overlay and hashrouter use relay
, I stuck with that name.
I agree that having two names is not ideal, but I felt the generic vs concrete boundary was an acceptable spot to diverge.
@@ -37,6 +37,8 @@ class RCLCxLedger | |||
public: | |||
//! Unique identifier of a ledger | |||
using ID = LedgerHash; | |||
//! Sequence number of a ledger | |||
using Seq = LedgerIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you eventually intend to change LedgerMaster to use Seq
instead of LedgerIndex
? Not insisting, just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially. LedgerMaster
and NetworkOps
still have functionality that is related to consensus that should arguably be moved to the generic implementation and that might warrant the change.
@@ -55,28 +57,28 @@ class RCLCxLedger | |||
} | |||
|
|||
//! Sequence number of the ledger. | |||
auto const& | |||
Seq const& |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for explicit return types!
@@ -874,7 +875,7 @@ Consensus<Adaptor>::getJson(bool full) const | |||
if (mode_.get() != ConsensusMode::wrongLedger) | |||
{ | |||
ret["synched"] = true; | |||
ret["ledger_seq"] = previousLedger_.seq() + 1; | |||
ret["ledger_seq"] = static_cast<std::uint32_t>(previousLedger_.seq())+ 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing a couple of these static_cast<>
s. That's fine, but you might consider adding a method named, say, seqAsUInt()
to your Ledger_t
implementations. That would keep the static_cast<>
inside the class responsible for supplying the data. Just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to leave as is for now since it is just for JSON/debug support. I intend (eventually) to revisit how that is done in the generic code.
@@ -192,6 +222,7 @@ class Validations | |||
decay_result_t<decltype (&Validation::ledgerID)(Validation)>; | |||
using NodeKey = decay_result_t<decltype (&Validation::key)(Validation)>; | |||
using NodeID = decay_result_t<decltype (&Validation::nodeID)(Validation)>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RCLConsensus.h is using NodeID_t
instead of NodeID
. Is it worth making that consistent? So, for example SeqType
would be Seq_t
? Just for your consideration. Personally, I really like it when a given type retains the same name for most or all of the code base. That makes it way easier to grep for instances. It's nice when it works that way, but it's hard to enforce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I should revisit this. Two follow-ups
- Is the
decay_result_t
style acceptable? Or do you prefer the explicitusing
style? - If its the latter, I added the
_t
suffix becauseNodeID
is an existingripple
type, so wasn't sure how to define an aliasusing NodeID = NodeID
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm completely fine with the decay_result_t
style. If you want to use the explicit using
style, consider
using NodeID = ripple::NodeID;
I believe you can also use
using NodeID = NodeID;
but you'll leave your readers a bit puzzled about what you're up to...
using TxSet_t = RCLTxSet; | ||
using PeerPosition_t = RCLCxPeerPos; | ||
using Ledger = RCLCxLedger; | ||
using NodeID = ripple::NodeID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error: declaration of 'using NodeID = using NodeID = class ripple::base_uint<160ul, ripple::detail::NodeIDTag>' [-fpermissive]
using NodeID = ripple::NodeID;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh. Windows was happy at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bummer. Clang on macOS was okay with it too. Sorry for the misdirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. I'll just drop that change for now and revisit separately.
51098cf
to
ea0fe89
Compare
Force pushed and dropped 51098cf |
The last commit fixes a bug in the simulation framework uncovered in other simulations that are not part of this review. |
1d946c3
to
53d4bda
Compare
Rebased on 0.80.0-rc2 |
53d4bda
to
73dba2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this review took so long. There is a lot of code here! I'm not convinced my particular review really brought much of value, but I noted a few things you might choose to look at. Interesting stuff.
src/test/csf/README.md
Outdated
describing, running and analyzing simulations of the consensus algorithm in a | ||
controlled manner. It is also used to unit test the generic Ripple consensus | ||
algorithm implementation. The framework is in its early stages, so the design | ||
and supported features is subject to change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
micro nit: is -> are?
src/test/csf/README.md
Outdated
each `Peer` has closed one additional ledger. After closing the additional | ||
ledger, the `Peer` stops participating in consensus. The first call is used to | ||
ensure a more useful prior state of all `Peer`s. After the transaction | ||
submission, the second call to `run` results in additional ledger that accepts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sub-nit: in "one" or "an" additional...?
src/ripple/consensus/LedgerTiming.h
Outdated
@@ -63,14 +63,14 @@ auto constexpr decreaseLedgerTimeResolutionEvery = 1; | |||
@pre previousResolution must be a valid bin | |||
from @ref ledgerPossibleTimeResolutions | |||
*/ | |||
template <class duration> | |||
template <class duration, class seq> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually expect types to have initial caps in our code base. So I'd expect to see class Duration, class Seq
. I think that change might make the code inside the body a bit easier to read. Just a thought, not a requirement.
src/ripple/consensus/LedgerTiming.h
Outdated
{ | ||
assert(ledgerSeq); | ||
assert(ledgerSeq != seq{0}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requirements on type seq
are not very clear. It looks like it should be integral, and ledgerSeq
certainly suggests it should be integral. But if, right here, I insert...
static_assert (std::is_integral<seq>::value, "");
The static_assert
fails from an instantiation based here: src/test/csf/Peer.h:818
consensus.timerEntry(now());
Maybe this is expected? Would like to know what you think...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is basically integral, but it also supports tagged_integer
types which aren't exactly integral. I could try adding a concept check if you find that useful, but will at least add some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments will be good.
//------------------------------------------------------------------------------ | ||
/* | ||
This file is part of rippled: https://github.com/ripple/rippled | ||
Copyright (c) 2012-2017 Ripple Labs Inc-> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the "Inc" supposed to end with a period? I think a global replace went awry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes. Sorry and thanks for catching the aftermath!
src/test/csf/ledgers.h
Outdated
//! When the ledger closed (up to closeTimeResolution) | ||
NetClock::time_point closeTime; | ||
|
||
//! Whether consenssus agreed on the close time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consenssus -> consensus?
src/test/csf/random.h
Outdated
return res; | ||
} | ||
|
||
/** Invokable that returns random samples from a range according to a discrete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that C++ is drifting toward the spelling "invocable" instead of "invokable".
src/test/csf/submitters.h
Outdated
namespace test { | ||
namespace csf { | ||
|
||
// Submitters are class for simulating submission of transactions to the network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class -> classes?
src/test/csf/submitters.h
Outdated
to succesive calls of distribution(), until stop. | ||
|
||
@tparam Distribution is a `UniformRandomBitGenerator` from the STL that | ||
is usesd by random distributions to generate random samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usesd -> used?
src/test/csf/submitters.h
Outdated
|
||
@tparam Distribution is a `UniformRandomBitGenerator` from the STL that | ||
is usesd by random distributions to generate random samples | ||
@tparam Generator is a object with member |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a -> an?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the last commit in detail, and tested out the entire P/R. All looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagged a few more nits, but nothing that needs to be addressed. 👍
@@ -458,10 +458,12 @@ struct Ledger | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The references to "Ripple network" and "Ripple Consensus Ledger" are still in the document. @mDuo13, is it important to address this? Or should we let it slide for now?
src/ripple/consensus/LedgerTiming.h
Outdated
@@ -62,15 +61,22 @@ auto constexpr decreaseLedgerTimeResolutionEvery = 1; | |||
|
|||
@pre previousResolution must be a valid bin | |||
from @ref ledgerPossibleTimeResolutions | |||
|
|||
@tparam Rep Type representing number of ticks in std::chrono::duration | |||
@tparam Period An std::ratioe representing tick period in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ratioe -> ratio?
src/ripple/consensus/LedgerTiming.h
Outdated
std::chrono::duration | ||
@tparam Seq Unsigned integer-like type corresponding to the ledger sequence | ||
number. It should be comparable to 0 and support modular | ||
division. Built-in and tagged_integers are supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That last sentence really helps. Thanks.
// an extra non-consensus transaction). | ||
|
||
Sim sim; | ||
ConsensusParms parms; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this should be const
?
|
||
PeerGroup network = loner + clique; | ||
network.connect( | ||
network, round<milliseconds>(0.2 * parms.ledgerGRANULARITY)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not looking too closely, but I suspect the need for the round
is because:
parts.ledgerGranularity == 1 sec
(double) 0.2 * 1 sec produces (double) 0.2 sec
round<milliseconds>(0.2 sec) produces 200 milliseconds
If that's correct you could probably get away from the round
call by
- declaring parts.ledgerGRANULARITY as 1000 milliseconds (represented internally as 1000), and
- instead of multiply by 0.2, divide by 5, so you stay in the integer world.
Then I think you should automatically get 200 milliseconds without the round call.
src/test/csf/random.h
Outdated
/** Generate a vector of random samples | ||
|
||
@param size the size of the sample | ||
@param pdf the distribution to sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were considering changing pdf
(Probability Distribution Function or Portable Document Format) to rnd
(Random Number Distribution). I like rnd
and RND
much better than pdf
and PDF
. Is that change still in the cards?
argStream >> maxNumValidators; | ||
argStream >> delayCount; | ||
|
||
std::chrono::milliseconds delay(delayCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these three locals can still be const
.
src/test/csf/Digraph.h
Outdated
|
||
/** Directed graph | ||
|
||
Basic directed graph that uses an adjacency lists to represent out edges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lists -> list?
src/test/csf/Scheduler.h
Outdated
The clock is advanced to the time | ||
of the last delivered event. | ||
|
||
@return `true` if a event was processed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could still be changed to an event...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments - LGTM
*/ | ||
template <class VertexName> | ||
void | ||
saveDot(std::ostream & out, VertexName&& vertexName) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not to over-engineer it, but couldn't this instead be defined as an ostream operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure how to do that conveniently and allow determining how to represent the vertex name at the call-site> I could try wrapping instead, but this seemed good enough for now.
std::cout << "Fully synchronized: " << std::boolalpha | ||
<< sim.synchronized() << "\n"; | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this test doesn't actually have any assertions, so the test runner aborts at the end. Maybe add a pass()
or some other trivial assertion ?
8bc1c62
to
212bd2d
Compare
Rebased on 0.80.0-rc3 |
212bd2d
to
59f28e6
Compare
Rebased on 0.80.0, squashed and marking as passed. |
- Separate `Scheduler` from `BasicNetwork`. - Add an event/collector framework for monitoring invariants and calculating statistics. - Allow distinct network and trust connections between Peers. - Add a simple routing strategy to support broadcasting arbitrary messages. - Add a common directed graph (`Digraph`) class for representing network and trust topologies. - Add a `PeerGroup` class for simpler specification of the trust and network topologies. - Add a `LedgerOracle` class to ensure distinct ledger histories and simplify branch checking. - Add a `Submitter` to send transactions in at fixed or random intervals to fixed or random peers. Co-authored-by: Joseph McGee
59f28e6
to
ad62c13
Compare
Rebased on 0.80.1 |
In 0.90.0-b1 |
This PR contains several changesets related to a redesigned consensus simulation framework (CSF) and reflects joint work between myself and @jmmcgee. Some highlights of the latest iteration of the framework:
Digraph
) class for representing network and trust topologiesPeerGroup
class for simpler specification of the trust and network topologiesLedgerOracle
class to ensure distinct ledger histories and simplify branch checkingSubmitter
to send transactions in at fixed or random intervals to fixed or random peersThe framework design is still in its early stages and will continue to be reworked in support of ongoing simulation studies.
For reviewers, there are two main requests
[FOLD] Prepare for CSF redesign:
1ca12e1 are satisfactory. The changes are non-behavior changing.src/test/csf
directory. Initially, reviewing the brief README may provide better context for the changes.