-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validations refactor (RIPD-1412) #2084
Conversation
Documentation preview http://bachase.github.io/ripd1412/index.html |
Codecov Report
@@ Coverage Diff @@
## develop #2084 +/- ##
===========================================
+ Coverage 68.98% 69.14% +0.15%
===========================================
Files 684 685 +1
Lines 50461 50530 +69
===========================================
+ Hits 34812 34938 +126
+ Misses 15649 15592 -57
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fairly confident there's a locking issue that needs attention in RCLValidations::onStale()
. Other than that the comments are mostly advisory and my own personal opinions.
src/ripple/app/misc/Validations.cpp
Outdated
} | ||
|
||
std::shared_ptr<ValidationSet> findSet (uint256 const& ledgerHash) | ||
ValidationSet * findSet (uint256 const& ledgerHash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For both findCreateSet
and findSet
I wonder if there's some risk that a returned ValidationSet
might be close to the end of its 10 minute life span. If the time of most recent reference is more important than time-since-creation you could consider calling touch()
on the returned reference. That will update the time on the returned reference and move it to the front of the chronological list.
If life span is not a concern then it might be nice to add a comment about why we shouldn't worry about it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this was temporary while I refactored. In the new version of the code in Validations
, the equivalent lookup into the aged map still returns a ValidationSet *
, but no public methods can return that set so the lifetime is guaranteed.
As you might be thinking, that is only "enforced" by convention and a soon-to-be-added comment. An alternative I considered was to completely restrict access to the map and only allow iteration over it via a functor.
src/ripple/app/misc/Validations.cpp
Outdated
{ | ||
return mValidations.fetch (ledgerHash); | ||
auto it = mValidations.find(ledgerHash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto const
?
src/ripple/app/misc/Validations.cpp
Outdated
return false; | ||
|
||
auto it = mCurrentValidations.find (*pubKey); | ||
auto res = mCurrentValidations.emplace(*pubKey,val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto const
?
src/ripple/app/misc/Validations.cpp
Outdated
{ | ||
std::vector <NetClock::time_point> times; | ||
ScopedLockType sl (mLock); | ||
if (auto j = findSet (hash)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto const
?
ScopedLockType sl(lock_); | ||
auto res = Validations::add(now(), *pubKey, val); | ||
|
||
// This is a duplicate validadtion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validadtion -> validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dad joke?
|
||
{ | ||
{ | ||
auto v = a.validation(Seq{1}, ID{1}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto const
?
auto val3 = a.validation(Seq{3}, ID{3}); | ||
|
||
clock.advance(4s); | ||
auto val4 = a.validation(Seq{4}, ID{4}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto const
here?
clock.advance(p.VALIDATION_CURRENT_LOCAL); | ||
|
||
// trigger iteration over current | ||
vals.current(a.now(), [](auto const&) {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test.
|
||
Validations::flush(); | ||
Validations::flush([&](RCLValidation && v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wince. When possible I try not to call outside functions with a lock
held. This code base doesn't hold to that rule very well, so I'll leave this for your consideration. But at this point the only place we need the ScopedLockType
is for the staleValidations_.emplace_back(std::move(v.val_));
inside the upcoming lambda.
Consider putting a lock inside the lambda (yes, the lock will thrash more) and postpone this ScopedLockType
to just before the if (anyNew && !writing_)
a few lines down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I move the lock inside, do I need to consider concurrent calls to RCLValidations::flush
? My understanding is that the lock protects all data members of the class, both in the old and new code. If another call to RCLValidations::flush
occurs, wouldn't this cause a data race?
As long as I document the semantics of flush
, I'm not too concerned about calling it while holding the lock and would prefer to avoid thrashing. If we are comfortable calling the external code in .emplace_back
while holding a lock, I think extending that comfort to the flush
function is acceptable. Does that seem reasonable or would you prefer I move the lock inside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may be looking at the code more closely than I am, but I'm not seeing any data races if the lock is inside the lambda. anyNew
is a local, so it is not subject to thread races.
If we put the lock inside the lambda, we also need to add a lock just in front of line 268 to protect the rest of the function. If that's what you're worried about then I agree. We need a lock there as well.
In terms of what is "outside code" my personal approach is to have these buckets:
std
library code.- non-
std
library, but well vetted, library code (like boost and beast). - code in this same class.
- everything else (including other functions/classes in this code base).
Most calls into std
library code I don't worry about. The characteristics of that code are stable and well understood. I get more nervous as I go further down the list because the code is less well characterized and more volatile. Note that, in my opinion, calling into our own code outside this class is the scariest. It is the most subject to change and we have the smallest understanding of the consequences of these changes.
Let me provide some more background for my concern. When I hold a lock and then call an outside (non-std
library) function I am, in effect, saying two things:
- It is okay if I hold this lock indefinitely, since I don't know how long this lock will be held.
- I strongly believe that there will be no lock order problems introduced by the outside code trying to acquire another lock or to re-acquire this lock.
I believe that you've checked all of the code that you're modifying at this moment. I'm sure it's all good right now.
My primary concern is that, as the outside code is maintained, those maintainers have no idea that this lock is being held. So problems like the ones I just described are easy to introduce during maintenance.
As I said, the existing code base is pretty bad (in my opinion) about holding locks over external calls. And sometimes holding a lock over an external call is just flat-out unavoidable. So you'll not be doing anything that isn't seen elsewhere in the code base. I suspect in this case it is avoidable (at the cost of some extra lock thrash). I'm just pointing it out for your consideration.
You know best how you should manage the lock, and I leave that to your discretion. I'm sure you'll make the best choice for the code base. Thanks for listening.
@@ -240,24 +240,45 @@ RCLValidations::onStale(RCLValidation&& v) | |||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about RCLValidations::onStale()
. In the old implementation the moral equivalent of onStale()
was always called with RCLValidations::lock_
held by a std::lock_guard
. Now all of the calls to onStale
are coming from outside RCLValidations
. So I'm fairly confident the callers don't hold the lock_
.
Any data members of RCLValidations
that are accessed by RCLValidations::doWrite()
need to always be protected by lock_
. That would include both staleValidations_
and writing_
.
It may be sufficient to simply grab the lock_
at the very top of onStale()
. I'm unsure because I don't know the requirements of all of the places that call onStale()
. The result might look like this:
{
ScopedLockType sl(lock_);
staleValidations_.emplace_back(std::move(v.val_));
if (writing_)
return;
writing_ = true;
}
app_.getJobQueue().addJob(...
Once the locking situation is cleared up I'll look hard at the code again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is that onStale
is only accessible and callable from the generic Validations
code and any calls into the Validations
code from RCLConsensus
require holding the lock for the duration of the call to the base class. The fact that this is opaque is a shortcoming of the CRTP design. I'll try to think of a way to make it clearer outside of just commenting. I'm open to suggestions too =).
@scottschurr has brought up some good points that might warrant switching away from the CRTP design. I'll take a day or two to see if it will stick and will potentially re-submit the review if it makes more sense. Feel free to put this on the back-burner until I comment further. I will note that I was partial to CRTP since I used it in the Consensus refactor. However, there was much more algorithmic specialization there. This class has a single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aditionally --> Additionally in the Add generic Validations
commit message
src/ripple/consensus/Validations.h
Outdated
// Signing key of node that published the validation | ||
NodeKey key() const; | ||
|
||
// Identifier of node that published the validaton |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validaton -> validation
src/ripple/consensus/Validations.h
Outdated
{ | ||
//! The number of trusted validations | ||
std::size_t count; | ||
//! The highest trusting node ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trusting -> trusted?
src/ripple/consensus/Validations.h
Outdated
if a time is specified | ||
|
||
@param t (Optional) Time used to determine staleness | ||
@param f Callable with signature (NodeKey const &, Validations const &) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea why this displays with |6|
in the docs?
Callable with signature (NodeKey const &, |6|Validations const &)
https://bachase.github.io/ripd1412/rippled/ref/ripple__Validations/current.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It had come up in the past, it seems to be a bug in how docca renders some templates.
public: | ||
/** Constructor | ||
|
||
@param p ValidationParms to control staleness/expiration of validaitons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another |6|
in the docs
https://bachase.github.io/ripd1412/rippled/ref/ripple__Validations/Validations.html
src/ripple/consensus/Validations.h
Outdated
current, //< This was a new validation and was added | ||
repeat, //< Already had this validation | ||
stale, //< Not current or was older than current from this node | ||
sameSeq, //< Had a validation with same sequence number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These descriptions aren't appearing in the docs
https://bachase.github.io/ripd1412/rippled/ref/ripple__Validations/AddOutcome.html
|
||
/** Wrapper over STValidation for generic Validation code | ||
|
||
Wraps an STValidation::pointer for compatiblity with the generic validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compatiblity --> compatibility
auto val3 = a.validation(Seq{3}, ID{3}); | ||
|
||
clock.advance(4s); | ||
auto val4 = a.validation(Seq{4}, ID{4}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using Seq{3}
here also.
// first round a,b,c agree | ||
for(auto const & node : {a,b,c}) | ||
BEAST_EXPECT(AddOutcome::current == vals.add(node, Seq{1}, ID{1})); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add a trusted node d
with validations for different ledger ids.
src/ripple/consensus/Validations.h
Outdated
LedgerID const& currentLedger, | ||
LedgerID const& priorLedger, | ||
std::size_t cutoffBefore, | ||
beast::Journal& j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we got rid of the one trace log in this function, Validations.h wouldn't need to include <ripple/beast/utility/Journal.h>
} | ||
|
||
{ | ||
// Cutoff old sequence numberss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numberss --> numbers
I just pushed two new changesets. The first addresses many of the stylistic and I decided to have the generic I kept the history of commits for reviewers, but will gladly squash if that is easier. |
bool shouldRelay = false; | ||
|
||
// only add trusted or listed | ||
if (val->isTrusted() || pubKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wilsonianb This was my approach to eliminating the extra isCurrent
call. Can you spot check it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could just be if (pubKey)
especially since we are using pubKey
below.
Updated documentation preview @ http://bachase.github.io/ripd1412/index.html |
I'm getting these compile warnings with the new commits
|
@bchase: Seeing this error on Travis build 9129.4:
Are you missing a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I like the way the the policy-oriented design worked out. The encapsulation seemed like it worked well. I left a few comments, but nothing that I think desperately needs fixing. Thanks for including me in the review. I learned some stuff.
// allocated to validators that were listed by publishers we trusted but | ||
// that we didn't choose to trust. The shorter term plan was just to forward | ||
// untrusted validations if peers wanted them or if we had the | ||
// ability/bandwidth to. None of that was implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the comments by @JoelKatx.
Here's a note, not a request for a change. When I see a FIXME I envision a day where someone sets loose through our source code to address all the FIXMEs. They would either fix the issue (and remove the comment) or just remove the comment (because we decide it's not really a problem). A quick grep shows there are at least 31 FIXME comments between src/ripple and src/test. If you add in the VFALCO and NIKB TODOs and NOTEs you end up with over 300 items.
For our intrepid code warrior to be successful at that mission, each FIXME comment needs to clearly identify what the problem is and, maybe not the fix, but at least the imagined desirable outcome. For me, at least, if I were to charge out on that FIXME mission, I wouldn't know what to do with this comment because I wouldn't know which of the unimplemented stuff really needed to be implemented or why. We may not have that level of information for this comment. But, if we do, it would be nice to include it.
src/ripple/consensus/Validations.h
Outdated
@param ledgerID The identifier of the ledger | ||
@return A pointer to the set of validations, nullptr if none exist. | ||
@param f Callable with signature (NodeKey const &, Validation const &) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the C++ trivia front, @HowardHinnant: I vaguely remember that for C++20(?) folks are headed toward referring to things that support INVOKE as Invocable
rather than as Callable
? For example, in C++17 there's an is_invocable
type trait... Not that anything needs to be done here.
src/ripple/consensus/Validations.h
Outdated
auto const oldSeq = oldVal.seq(); | ||
auto const newSeq = val.seq(); | ||
std::uint32_t const oldSeq = oldVal.seq(); | ||
std::uint32_t const newSeq = val.seq(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test for narrowing conversions I changed this from an assignment-style initialization to a braced initialization. This is cool because you get a compilation failure on a narrowing conversion. Low and behold, if I use this for initialization...
std::uint32_t const oldSeq {oldVal.seq()};
I get a compile error out of RCLValidations.cpp:
In file included from src/ripple/app/consensus/RCLValidations.cpp:21:
In file included from src/ripple/app/consensus/RCLValidations.h:24:
src/ripple/consensus/Validations.h:384:45: error: non-constant-expression cannot
be narrowed from type 'std::size_t' (aka 'unsigned long') to
'std::uint32_t' (aka 'unsigned int') in initializer list
[-Wc++11-narrowing]
std::uint32_t const oldSeq {oldVal.seq()};
Some people might take that as a sign we should be using auto
instead of an explicit type. My personal take on it is that the uint32_t
declaration is dead on (we know that's the type of the sequence), and we should be getting more aggressive about using brace initialization instead of assignment-style initialization. That way the compiler will find our errors for us.
I'm not going to request any changes as a result of this experiment. I'm sure the code base is rife with this sort of problem. But I thought it was an interesting lesson. Personally I'm going to be putting more emphasis on brace initialization (but don't combine it with auto
!) in my own coding style after running this experiment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this! I intended to make seq std::uint32_t
everywhere and had accidentally left it as a std::size_t
in RCLValidation
.
class RCLValidation; | ||
class RCLValidationsPolicy; | ||
using RCLValidations = | ||
Validations<RCLValidationsPolicy, RCLValidation, std::mutex>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice forward declaration.
src/ripple/consensus/Validations.h
Outdated
using LedgerID = | ||
decay_result_t<decltype (&Validation::ledgerID)(Validation)>; | ||
using NodeKey = decay_result_t<decltype (&Validation::key)(Validation)>; | ||
using NodeID = decay_result_t<decltype (&Validation::nodeID)(Validation)>; | ||
using ValidationMap = hash_map<NodeKey, Validation>; | ||
|
||
using ScopedLock = std::lock_guard<MutexType>; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For extra credit you could add some static_assert
s for the methods that StalePolicy
and Validation
need to provide. But not required...
} | ||
|
||
{ | ||
// Test the node changing signing key, then reissuing a ledger | ||
|
||
// Confirm old ledger on hand, but not new ledger | ||
BEAST_EXPECT(vals.exists(ID{2})); | ||
BEAST_EXPECT(!vals.exists(ID{20})); | ||
BEAST_EXPECT(harness.vals().numTrustedForLedger(ID{2})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this test will pass if numTrustedForLedger()
returns 2...
// ScopedLockType& acts as a reminder to future maintainers. | ||
void | ||
RCLValidations::doWrite(ScopedLockType&) | ||
RCLValidationsPolicy::doStaleWrite(ScopedLockType&) | ||
{ | ||
std::string insVal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
insVal
and findSeq
could be static const
writing_ = true; | ||
doWrite(sl); | ||
staleWriting_ = true; | ||
doStaleWrite(sl); | ||
} | ||
|
||
// Handle the case where flush() is called while a queuedWrite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... while a doStaleWrite() is...
?
} | ||
|
||
// NOTE: doWrite() must be called with mLock *locked*. The passed | ||
// NOTE: doStaleWrite() must be called with mLock *locked*. The passed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mLock -> staleLock_?
{ | ||
using AddOutcome = RCLValidations::AddOutcome; | ||
|
||
AddOutcome res = validations.add(*pubKey, val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const res
?
|
||
}; | ||
|
||
/** Implements the StalePolicy policy class for adapating Validations in the RCL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adapating --> adapting
/** Flush current validations to disk before shutdown. | ||
|
||
@param remaining The remaining validations to flush | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is misaligned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble spotting it on my end. Which part is misaligned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return val_->getLedgerHash(); | ||
} | ||
|
||
/// Validated ledger's sequence number (0 if none) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test that this returns 0 if the sequence is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be changed to
return val_->isFieldPresent(sfLedgerSequence) ? val_->getFieldU32(sfLedgerSequence) : 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make it more explicit, but I believe that getFieldByValue
will already to that https://github.com/ripple/rippled/blob/develop/src/ripple/protocol/STObject.h#L573
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see I wasn't consistent with that for loadFee
. I'll add the explicit check.
return val_->getFieldU32(sfLedgerSequence); | ||
} | ||
|
||
/// Validation ledger's signing time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could change this to
/// Validation signing time
return val_->isTrusted(); | ||
} | ||
|
||
/// Set the prior ledger hash this validation is replacing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replacing --> following
2. Add the validation to the set of validations if current. | ||
3. If new and trusted, send the validation to the ledgerMaster. | ||
|
||
@param app Application object containing validations and ledgerMesger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ledgerMesger --> ledgerMaster
src/ripple/consensus/Validations.h
Outdated
|
||
beast::Journal j_; | ||
|
||
//! StalePolicy details providing now(),onStale() and flush callbacks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now(),onStale()
--> now(), onStale()
} | ||
|
||
// Handle the case where flush() is called while a queuedWrite | ||
// is already in progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this loop is necessary anymore.
Previously, it acted to make flush
synchronous since the write job was queued, but now we call doStaleWrite
directly.
https://github.com/ripple/rippled/blob/develop/src/ripple/app/misc/Validations.cpp#L443-L463
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a port of the changes in #2083. If flush
is called while an existing async job was queued or is now running, this would block until it completes. It would be nice to drop this if we don't care about whether that job finishes before returning.
@scottschurr: What is your opinion? Is it important for the flush
call to block until the potentially asynchronous call is complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wilsonianb and @bachase I think the loop is still useful. This is a thread synchronization point. A different thread may have called doStaleWrite()
. If that thread is running when we enter flush()
at shutdown, then staleWriting_
will be true
(line 89). So we'll skip calling doStaleWrtie()
ourselves and we'll let the other thread do our job for us. However when we exit this function all of the stale cache should be flushed to the database. The loop makes sure we don't leave the function until the stale cache is all safely in the database.
I had hoped the comments were sufficient to describe what we're up to. but maybe not. @wilsonianb, do you have a suggested change for the comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, that makes sense to me.
I'd vote for this being clearly conveyed in the comment:
when we exit this function all of the stale cache should be flushed to the database. The loop makes sure we don't leave the function until the stale cache is all safely in the database.
bool shouldRelay = false; | ||
|
||
// only add trusted or listed | ||
if (val->isTrusted() || pubKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could just be if (pubKey)
especially since we are using pubKey
below.
{ | ||
JLOG(j.debug()) << "Val for " << hash << " from " | ||
<< toBase58(TokenType::TOKEN_NODE_PUBLIC, signer) | ||
<< " not added UNtrustesd/"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UNtrustesd/ --> UNtrusted
Rebased on 0.70.0-b4 and addressed the latest set of comments. I extending the private Thanks @wilsonianb and @scottschurr for your careful reviews; the code is definitely better for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I spotted was a comment that might need attention.
I like how current<>
requires a lambda to do the reserve()
. It means every caller of current<>
must at least think about that issue. 👍
@@ -199,11 +202,11 @@ handleNewValidation(Application& app, | |||
bool shouldRelay = false; | |||
|
|||
// only add trusted or listed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still correct since we're no longer checking val->isTrusted()
on the following line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, still true. pubKey
is only set if the validation is trusted or listed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are few fee and amendment files that could include STValidation.h
instead of RCLValidations.h
This worked for me
wilsonianb@78eaa98
soci::transaction tr(*db); | ||
for (auto it: vector) | ||
{ | ||
s.erase (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This erase
wasn't carried over to doStaleWrite
. Do we not need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!!!!
boost::none, | ||
[&](std::size_t) {}, // nothing to reserve | ||
[&](NodeKey const&, Validation const& v) { | ||
if (v.trusted() && v.isPreviousLedgerID(ledgerID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't new and doesn't have to be addressed here, but it seems like the way we handle a validation's previous ledger id could be improved.
The previous ledger id appears to only be set when we replace a validator's last seen current validation with the new one.
https://github.com/ripple/rippled/pull/2084/files#diff-46f83c3cd15dd825b63068c0ab4dea00R423
This means that if we receive a validation from a validator for the first time (or after not seeing one in the past 5 minutes), the previous ledger id would not be set and the validation would not be counted in getNodesAfter
or currentTrustedDistribution
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think that may be the best we can do though, since we have a gap in the validation stream from that validator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option would be for getNodesAfter
and currentTrustedDistribution
to take a ledger sequence instead of a LedgerID
.
In both cases, we don't care if there was actually a validation for the previous ledger. We only care that a validation is for ledger that follows a particular ledger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the LedgerID
, wouldn't we be counting ledgers that are potentially on a different ledger chain? At least, for currentTrustedDistrubution
, I think care about that distinction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would change the assumption about what's being counted in currentTrustedDistrubution
. The problem is that the count under the current assumption (same ledger chain) can be inaccurate if we don't set the previous ledger id for a validation. It seems like we have the information to determine that a lone validation validates a ledger id whose sequence is immediately after a given sequence. It would just be a matter of finding the least messy way of figuring that out.
src/ripple/consensus/Validations.h
Outdated
maybeStaleValidation.emplace(std::move(oldVal)); | ||
// Replace old val in the map and set the previous ledger ID | ||
ins.first->second = val; | ||
ins.first->second.setPreviousLedgerID(oldID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the rare case that we are replacing an identical validation from a signing revoked key, I think the previous ledger id should actually be set to the previous ledger id of the validation being replaced.
This could be changed to something like:
wilsonianb@fcf2f4f#diff-46f83c3cd15dd825b63068c0ab4dea00R427
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I'll cherry-pick with a modification to ensure we determine oldID beforre moving out the oldVal.
src/ripple/consensus/Validations.h
Outdated
/** Expire old validation sets | ||
|
||
Remove validation sets that were accessed more than | ||
VALIDATION_SET_EXPIRES ago. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VALIDATION_SET_EXPIRES
--> validationSET_EXPIRES
// This ensures that all validations are written upon return from | ||
// this function. | ||
|
||
while (staleWriting_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I understand the purpose of this loop, it could be put in an else
|
||
// A new id is needed since the signing key changed | ||
BEAST_EXPECT( | ||
AddOutcome::sameSeq == harness.add(a, Seq{2}, ID{20})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also test adding a ledger with a new signing key and the same sequence and id?
} | ||
|
||
{ | ||
// Processing validations out of order should ignore the older |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also test the following:
- Current validation followed by validation with later sequence but earlier sign (and seen?) time
- Current validation followed by validation with lower sequence (but never before seen id) and later sign (and seen?) time
|
||
BEAST_EXPECT(AddOutcome::current == harness.add(a, Seq{1}, ID{1})); | ||
|
||
BEAST_EXPECT(harness.stale().empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add another call to harness.vals().currentTrusted();
before this check
|
||
// Only a is trusted | ||
BEAST_EXPECT(harness.vals().currentTrusted().size() == 1); | ||
BEAST_EXPECT(harness.vals().currentTrusted()[0].ledgerID() == ID{1}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a different ID
is used for b
, it is clearer that the trusted validation is from a
Node a = harness.makeNode(); | ||
|
||
BEAST_EXPECT(AddOutcome::current == harness.add(a, Seq{1}, ID{1})); | ||
BEAST_EXPECT(harness.vals().numTrustedForLedger(ID{1})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding another call to harness.vals().expire()
before this check
I will squash and rebase after the final 👍 . I also tested running |
@@ -480,6 +486,14 @@ class Validations_test : public beast::unit_test::suite | |||
|
|||
BEAST_EXPECT(harness.vals().numTrustedForLedger(ID{2}) == 0); | |||
BEAST_EXPECT(harness.vals().numTrustedForLedger(ID{20}) == 1); | |||
|
|||
// A new key, but re-issue the same validation, since it is the | |||
// same, it is considered a repeat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want this be AddOutcome::current
for this case:
#2084 (comment)
We would need to change this check in Validations::add
https://github.com/ripple/rippled/pull/2084/files#diff-46f83c3cd15dd825b63068c0ab4dea00R383
to something like
auto const ret = byLedger_[id].emplace(key, val);
if (! ret.second && ret.first->second.key() == val.key())
return AddOutcome::repeat;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I should've noticed that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, there was an additional change for this.
} | ||
{ | ||
// Processs validations out of order with shifted times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Processs --> Process
BEAST_EXPECT( | ||
AddOutcome::current == harness.add(a, Seq{8}, ID{8})); | ||
|
||
// Process a validation that has "later" seq but early sign time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting how sign time trumps ledger sequence, but it makes sense.
Rebased and squashed after addressing last set of @wilsonianb's comments. |
Build error in CircleCI:
|
Argg, C++17 support strikes again! |
src/ripple/consensus/Validations.h
Outdated
val.key() != oldVal.key()) | ||
{ | ||
// This is either a newer validation or a new signing key | ||
LedgerID const oldID = [&]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename oldID
as prevID
<< seq; | ||
} | ||
|
||
const bool currVal = res == AddOutcome::current; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also want to call checkAccept
and relay for sameSeq
- If a validator publishes a second validation for the same sequence (different ledger id), we should call
checkAccept
and relay - If a validator changes keys and publishes a new validation for the same sequence (same ledger id), we should at least relay. (Calling
checkAccept
should be inconsequential.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
I'm still in favor of ditching ValidationCounts
/highNode
if @JoelKatz approves.
#2084 (comment)
Thanks @wilsonianb and @scottschurr for your thorough reviews. This is a much better change now. I'll consider the |
Squashed and rebased on 0.70.0-b6 |
Rebased onto 0.70.0-b8 and added the changest from #2094 to avoid merge conflicts when putting into a future release. |
Introduces a generic Validations class for storing and querying current and recent validations. Aditionally migrates the validation related timing constants from LedgerTiming to the new Validations code. Introduces RCLValidations as the version of Validations adapted for use in the RCL. This adds support for flushing/writing validations to the sqlite log and also manages concurrent access to the Validations data. RCLValidations::flush() no longer uses the JobQueue for its database write at shutdown. It performs the write directly without changing threads.
Rebased on 0.70.1 |
Merged into 0.80.0-b1 |
This PR refactors the
Validations
class into a generic version that can be adapted for the consensus simulation framework.RCLValidations
replaces the current implementation by adapting the generic version to use RCL validations.The first two changesets are minor refactorings that simplify the generic implementation.
The final changeset cherry-picks @scottschurr's work in #2083 and could use his review.