Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validations refactor (RIPD-1412) #2084

Closed
wants to merge 2 commits into from
Closed

Conversation

bachase
Copy link
Collaborator

@bachase bachase commented Apr 11, 2017

This PR refactors the Validations class into a generic version that can be adapted for the consensus simulation framework. RCLValidations replaces the current implementation by adapting the generic version to use RCL validations.

The first two changesets are minor refactorings that simplify the generic implementation.
The final changeset cherry-picks @scottschurr's work in #2083 and could use his review.

@bachase
Copy link
Collaborator Author

bachase commented Apr 11, 2017

Documentation preview http://bachase.github.io/ripd1412/index.html

@codecov-io
Copy link

codecov-io commented Apr 11, 2017

Codecov Report

Merging #2084 into develop will increase coverage by 0.15%.
The diff coverage is 83.9%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2084      +/-   ##
===========================================
+ Coverage    68.98%   69.14%   +0.15%     
===========================================
  Files          684      685       +1     
  Lines        50461    50530      +69     
===========================================
+ Hits         34812    34938     +126     
+ Misses       15649    15592      -57
Impacted Files Coverage Δ
src/ripple/app/misc/FeeVote.h 100% <ø> (ø) ⬆️
src/ripple/app/main/Application.h 100% <ø> (ø) ⬆️
src/ripple/consensus/LedgerTiming.h 95.83% <ø> (ø) ⬆️
src/ripple/app/misc/AmendmentTable.h 100% <ø> (ø) ⬆️
src/ripple/app/misc/NetworkOPs.cpp 58.7% <0%> (ø) ⬆️
src/ripple/overlay/impl/PeerImp.cpp 0% <0%> (ø) ⬆️
src/ripple/consensus/DisputedTx.h 81.94% <100%> (ø) ⬆️
src/ripple/consensus/Consensus.h 84.89% <100%> (+0.06%) ⬆️
src/ripple/app/misc/impl/AmendmentTable.cpp 92.78% <100%> (ø) ⬆️
src/ripple/consensus/Consensus.cpp 82.69% <100%> (ø)
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3bfd9de...1438ae3. Read the comment docs.

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly confident there's a locking issue that needs attention in RCLValidations::onStale(). Other than that the comments are mostly advisory and my own personal opinions.

}

std::shared_ptr<ValidationSet> findSet (uint256 const& ledgerHash)
ValidationSet * findSet (uint256 const& ledgerHash)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both findCreateSet and findSet I wonder if there's some risk that a returned ValidationSet might be close to the end of its 10 minute life span. If the time of most recent reference is more important than time-since-creation you could consider calling touch() on the returned reference. That will update the time on the returned reference and move it to the front of the chronological list.

If life span is not a concern then it might be nice to add a comment about why we shouldn't worry about it. Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this was temporary while I refactored. In the new version of the code in Validations, the equivalent lookup into the aged map still returns a ValidationSet *, but no public methods can return that set so the lifetime is guaranteed.

As you might be thinking, that is only "enforced" by convention and a soon-to-be-added comment. An alternative I considered was to completely restrict access to the map and only allow iteration over it via a functor.

{
return mValidations.fetch (ledgerHash);
auto it = mValidations.find(ledgerHash);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto const?

return false;

auto it = mCurrentValidations.find (*pubKey);
auto res = mCurrentValidations.emplace(*pubKey,val);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto const?

{
std::vector <NetClock::time_point> times;
ScopedLockType sl (mLock);
if (auto j = findSet (hash))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto const?

ScopedLockType sl(lock_);
auto res = Validations::add(now(), *pubKey, val);

// This is a duplicate validadtion
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validadtion -> validation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dad joke?


{
{
auto v = a.validation(Seq{1}, ID{1});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto const?

auto val3 = a.validation(Seq{3}, ID{3});

clock.advance(4s);
auto val4 = a.validation(Seq{4}, ID{4});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto const here?

clock.advance(p.VALIDATION_CURRENT_LOCAL);

// trigger iteration over current
vals.current(a.now(), [](auto const&) {});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test.


Validations::flush();
Validations::flush([&](RCLValidation && v)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wince. When possible I try not to call outside functions with a lock held. This code base doesn't hold to that rule very well, so I'll leave this for your consideration. But at this point the only place we need the ScopedLockType is for the staleValidations_.emplace_back(std::move(v.val_)); inside the upcoming lambda.

Consider putting a lock inside the lambda (yes, the lock will thrash more) and postpone this ScopedLockType to just before the if (anyNew && !writing_) a few lines down.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I move the lock inside, do I need to consider concurrent calls to RCLValidations::flush? My understanding is that the lock protects all data members of the class, both in the old and new code. If another call to RCLValidations::flush occurs, wouldn't this cause a data race?

As long as I document the semantics of flush, I'm not too concerned about calling it while holding the lock and would prefer to avoid thrashing. If we are comfortable calling the external code in .emplace_back while holding a lock, I think extending that comfort to the flush function is acceptable. Does that seem reasonable or would you prefer I move the lock inside?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be looking at the code more closely than I am, but I'm not seeing any data races if the lock is inside the lambda. anyNew is a local, so it is not subject to thread races.

If we put the lock inside the lambda, we also need to add a lock just in front of line 268 to protect the rest of the function. If that's what you're worried about then I agree. We need a lock there as well.

In terms of what is "outside code" my personal approach is to have these buckets:

  • std library code.
  • non-std library, but well vetted, library code (like boost and beast).
  • code in this same class.
  • everything else (including other functions/classes in this code base).

Most calls into std library code I don't worry about. The characteristics of that code are stable and well understood. I get more nervous as I go further down the list because the code is less well characterized and more volatile. Note that, in my opinion, calling into our own code outside this class is the scariest. It is the most subject to change and we have the smallest understanding of the consequences of these changes.

Let me provide some more background for my concern. When I hold a lock and then call an outside (non-std library) function I am, in effect, saying two things:

  • It is okay if I hold this lock indefinitely, since I don't know how long this lock will be held.
  • I strongly believe that there will be no lock order problems introduced by the outside code trying to acquire another lock or to re-acquire this lock.

I believe that you've checked all of the code that you're modifying at this moment. I'm sure it's all good right now.

My primary concern is that, as the outside code is maintained, those maintainers have no idea that this lock is being held. So problems like the ones I just described are easy to introduce during maintenance.

As I said, the existing code base is pretty bad (in my opinion) about holding locks over external calls. And sometimes holding a lock over an external call is just flat-out unavoidable. So you'll not be doing anything that isn't seen elsewhere in the code base. I suspect in this case it is avoidable (at the cost of some extra lock thrash). I'm just pointing it out for your consideration.

You know best how you should manage the lock, and I leave that to your discretion. I'm sure you'll make the best choice for the code base. Thanks for listening.

@@ -240,24 +240,45 @@ RCLValidations::onStale(RCLValidation&& v)
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about RCLValidations::onStale(). In the old implementation the moral equivalent of onStale() was always called with RCLValidations::lock_ held by a std::lock_guard. Now all of the calls to onStale are coming from outside RCLValidations. So I'm fairly confident the callers don't hold the lock_.

Any data members of RCLValidations that are accessed by RCLValidations::doWrite() need to always be protected by lock_. That would include both staleValidations_ and writing_.

It may be sufficient to simply grab the lock_ at the very top of onStale(). I'm unsure because I don't know the requirements of all of the places that call onStale(). The result might look like this:

    {
        ScopedLockType sl(lock_);
        staleValidations_.emplace_back(std::move(v.val_));
        if (writing_)
            return;

        writing_ = true;
    }
    app_.getJobQueue().addJob(...

Once the locking situation is cleared up I'll look hard at the code again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is that onStale is only accessible and callable from the generic Validations code and any calls into the Validations code from RCLConsensus require holding the lock for the duration of the call to the base class. The fact that this is opaque is a shortcoming of the CRTP design. I'll try to think of a way to make it clearer outside of just commenting. I'm open to suggestions too =).

@bachase
Copy link
Collaborator Author

bachase commented Apr 13, 2017

@scottschurr has brought up some good points that might warrant switching away from the CRTP design. I'll take a day or two to see if it will stick and will potentially re-submit the review if it makes more sense. Feel free to put this on the back-burner until I comment further.

I will note that I was partial to CRTP since I used it in the Consensus refactor. However, there was much more algorithmic specialization there. This class has a single onStale callback so the higher degree of coupling is (probably) not necessary.

Copy link
Contributor

@wilsonianb wilsonianb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aditionally --> Additionally in the Add generic Validations commit message

// Signing key of node that published the validation
NodeKey key() const;

// Identifier of node that published the validaton
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validaton -> validation

{
//! The number of trusted validations
std::size_t count;
//! The highest trusting node ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trusting -> trusted?

if a time is specified

@param t (Optional) Time used to determine staleness
@param f Callable with signature (NodeKey const &, Validations const &)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why this displays with |6| in the docs?
Callable with signature (NodeKey const &, |6|Validations const &)
https://bachase.github.io/ripd1412/rippled/ref/ripple__Validations/current.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It had come up in the past, it seems to be a bug in how docca renders some templates.

public:
/** Constructor

@param p ValidationParms to control staleness/expiration of validaitons
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current, //< This was a new validation and was added
repeat, //< Already had this validation
stale, //< Not current or was older than current from this node
sameSeq, //< Had a validation with same sequence number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


/** Wrapper over STValidation for generic Validation code

Wraps an STValidation::pointer for compatiblity with the generic validation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compatiblity --> compatibility

auto val3 = a.validation(Seq{3}, ID{3});

clock.advance(4s);
auto val4 = a.validation(Seq{4}, ID{4});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using Seq{3} here also.

// first round a,b,c agree
for(auto const & node : {a,b,c})
BEAST_EXPECT(AddOutcome::current == vals.add(node, Seq{1}, ID{1}));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a trusted node d with validations for different ledger ids.

LedgerID const& currentLedger,
LedgerID const& priorLedger,
std::size_t cutoffBefore,
beast::Journal& j)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we got rid of the one trace log in this function, Validations.h wouldn't need to include <ripple/beast/utility/Journal.h>

}

{
// Cutoff old sequence numberss
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numberss --> numbers

@bachase
Copy link
Collaborator Author

bachase commented Apr 18, 2017

I just pushed two new changesets. The first addresses many of the stylistic and auto-abuse issues @scottschurr identified. The second switches from CRTP to a policy design and addresses issues @wilsonianb identified.

I decided to have the generic Validations class own the instance of the policy as a private member. I believe this helps indicate that the policy class is completely subservient to the Validations instance. The RCLValidationsPolicy owns the stale validations and has its own mutex for managing access. The Validations class has its own mutex for managing access to its members--the current and byLedger Validations. I've avoided calling into the policy instance while holding a lock, save for one onStale call which I believe is reasonable to avoid some copies.

I kept the history of commits for reviewers, but will gladly squash if that is easier.

bool shouldRelay = false;

// only add trusted or listed
if (val->isTrusted() || pubKey)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wilsonianb This was my approach to eliminating the extra isCurrent call. Can you spot check it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could just be if (pubKey) especially since we are using pubKey below.

@bachase
Copy link
Collaborator Author

bachase commented Apr 18, 2017

Updated documentation preview @ http://bachase.github.io/ripd1412/index.html

@wilsonianb
Copy link
Contributor

I'm getting these compile warnings with the new commits

In file included from src/test/consensus/Validations_test.cpp:22:0,
                 from src/test/unity/consensus_test_unity.cpp:22:
src/ripple/consensus/Validations.h: In instantiation of 'ripple::Validations<StalePolicy, Validation, MutexType>::Validations(const ripple::ValidationParms&, beast::abstract_clock<std::chrono::_V2::steady_clock>&, beast::Journal, Ts&& ...) [with Ts = {ripple::test::Validations_test::StaleData&, beast::manual_clock<std::chrono::_V2::steady_clock>&}; StalePolicy = ripple::test::Validations_test::StalePolicy; Validation = ripple::test::Validations_test::Validation; MutexType = ripple::test::Validations_test::DummyMutex]':
src/test/consensus/Validations_test.cpp:380:63:   required from here
src/ripple/consensus/Validations.h:227:20: warning: 'ripple::Validations<ripple::test::Validations_test::StalePolicy, ripple::test::Validations_test::Validation, ripple::test::Validations_test::DummyMutex>::j_' will be initialized after [-Wreorder]
     beast::Journal j_;
                    ^
src/ripple/consensus/Validations.h:225:17: warning:   'ripple::test::Validations_test::StalePolicy ripple::Validations<ripple::test::Validations_test::StalePolicy, ripple::test::Validations_test::Validation, ripple::test::Validations_test::DummyMutex>::stalePolicy_' [-Wreorder]
     StalePolicy stalePolicy_;
                 ^
src/ripple/consensus/Validations.h:303:5: warning:   when initialized here [-Wreorder]
     Validations(
     ^

@scottschurr
Copy link
Collaborator

@bchase: Seeing this error on Travis build 9129.4:

src/test/consensus/Validations_test.cpp:892:34: error: no matching literal operator for call to 'operator""s' with argument of type 'unsigned long long' or 'const char *', and no matching literal operator template
        harness.clock().advance(1s);
                                 ^
1 error generated.

Are you missing a using namespace std::chrono_literals;?

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I like the way the the policy-oriented design worked out. The encapsulation seemed like it worked well. I left a few comments, but nothing that I think desperately needs fixing. Thanks for including me in the review. I learned some stuff.

// allocated to validators that were listed by publishers we trusted but
// that we didn't choose to trust. The shorter term plan was just to forward
// untrusted validations if peers wanted them or if we had the
// ability/bandwidth to. None of that was implemented.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the comments by @JoelKatx.

Here's a note, not a request for a change. When I see a FIXME I envision a day where someone sets loose through our source code to address all the FIXMEs. They would either fix the issue (and remove the comment) or just remove the comment (because we decide it's not really a problem). A quick grep shows there are at least 31 FIXME comments between src/ripple and src/test. If you add in the VFALCO and NIKB TODOs and NOTEs you end up with over 300 items.

For our intrepid code warrior to be successful at that mission, each FIXME comment needs to clearly identify what the problem is and, maybe not the fix, but at least the imagined desirable outcome. For me, at least, if I were to charge out on that FIXME mission, I wouldn't know what to do with this comment because I wouldn't know which of the unimplemented stuff really needed to be implemented or why. We may not have that level of information for this comment. But, if we do, it would be nice to include it.

@param ledgerID The identifier of the ledger
@return A pointer to the set of validations, nullptr if none exist.
@param f Callable with signature (NodeKey const &, Validation const &)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the C++ trivia front, @HowardHinnant: I vaguely remember that for C++20(?) folks are headed toward referring to things that support INVOKE as Invocable rather than as Callable? For example, in C++17 there's an is_invocable type trait... Not that anything needs to be done here.

auto const oldSeq = oldVal.seq();
auto const newSeq = val.seq();
std::uint32_t const oldSeq = oldVal.seq();
std::uint32_t const newSeq = val.seq();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test for narrowing conversions I changed this from an assignment-style initialization to a braced initialization. This is cool because you get a compilation failure on a narrowing conversion. Low and behold, if I use this for initialization...

std::uint32_t const oldSeq {oldVal.seq()};

I get a compile error out of RCLValidations.cpp:

In file included from src/ripple/app/consensus/RCLValidations.cpp:21:
In file included from src/ripple/app/consensus/RCLValidations.h:24:
src/ripple/consensus/Validations.h:384:45: error: non-constant-expression cannot
      be narrowed from type 'std::size_t' (aka 'unsigned long') to
      'std::uint32_t' (aka 'unsigned int') in initializer list
      [-Wc++11-narrowing]
                std::uint32_t const oldSeq {oldVal.seq()};

Some people might take that as a sign we should be using auto instead of an explicit type. My personal take on it is that the uint32_t declaration is dead on (we know that's the type of the sequence), and we should be getting more aggressive about using brace initialization instead of assignment-style initialization. That way the compiler will find our errors for us.

I'm not going to request any changes as a result of this experiment. I'm sure the code base is rife with this sort of problem. But I thought it was an interesting lesson. Personally I'm going to be putting more emphasis on brace initialization (but don't combine it with auto!) in my own coding style after running this experiment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! I intended to make seq std::uint32_t everywhere and had accidentally left it as a std::size_t in RCLValidation.

class RCLValidation;
class RCLValidationsPolicy;
using RCLValidations =
Validations<RCLValidationsPolicy, RCLValidation, std::mutex>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice forward declaration.

using LedgerID =
decay_result_t<decltype (&Validation::ledgerID)(Validation)>;
using NodeKey = decay_result_t<decltype (&Validation::key)(Validation)>;
using NodeID = decay_result_t<decltype (&Validation::nodeID)(Validation)>;
using ValidationMap = hash_map<NodeKey, Validation>;

using ScopedLock = std::lock_guard<MutexType>;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For extra credit you could add some static_asserts for the methods that StalePolicy and Validation need to provide. But not required...

}

{
// Test the node changing signing key, then reissuing a ledger

// Confirm old ledger on hand, but not new ledger
BEAST_EXPECT(vals.exists(ID{2}));
BEAST_EXPECT(!vals.exists(ID{20}));
BEAST_EXPECT(harness.vals().numTrustedForLedger(ID{2}));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this test will pass if numTrustedForLedger() returns 2...

// ScopedLockType& acts as a reminder to future maintainers.
void
RCLValidations::doWrite(ScopedLockType&)
RCLValidationsPolicy::doStaleWrite(ScopedLockType&)
{
std::string insVal(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insVal and findSeq could be static const

writing_ = true;
doWrite(sl);
staleWriting_ = true;
doStaleWrite(sl);
}

// Handle the case where flush() is called while a queuedWrite
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... while a doStaleWrite() is...?

}

// NOTE: doWrite() must be called with mLock *locked*. The passed
// NOTE: doStaleWrite() must be called with mLock *locked*. The passed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mLock -> staleLock_?

{
using AddOutcome = RCLValidations::AddOutcome;

AddOutcome res = validations.add(*pubKey, val);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const res?


};

/** Implements the StalePolicy policy class for adapating Validations in the RCL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adapating --> adapting

/** Flush current validations to disk before shutdown.

@param remaining The remaining validations to flush
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is misaligned

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble spotting it on my end. Which part is misaligned?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return val_->getLedgerHash();
}

/// Validated ledger's sequence number (0 if none)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test that this returns 0 if the sequence is missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be changed to
return val_->isFieldPresent(sfLedgerSequence) ? val_->getFieldU32(sfLedgerSequence) : 0;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make it more explicit, but I believe that getFieldByValue will already to that https://github.com/ripple/rippled/blob/develop/src/ripple/protocol/STObject.h#L573

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see I wasn't consistent with that for loadFee. I'll add the explicit check.

return val_->getFieldU32(sfLedgerSequence);
}

/// Validation ledger's signing time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could change this to
/// Validation signing time

return val_->isTrusted();
}

/// Set the prior ledger hash this validation is replacing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replacing --> following

2. Add the validation to the set of validations if current.
3. If new and trusted, send the validation to the ledgerMaster.

@param app Application object containing validations and ledgerMesger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ledgerMesger --> ledgerMaster


beast::Journal j_;

//! StalePolicy details providing now(),onStale() and flush callbacks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now(),onStale() --> now(), onStale()

}

// Handle the case where flush() is called while a queuedWrite
// is already in progress.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this loop is necessary anymore.
Previously, it acted to make flush synchronous since the write job was queued, but now we call doStaleWrite directly.
https://github.com/ripple/rippled/blob/develop/src/ripple/app/misc/Validations.cpp#L443-L463

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a port of the changes in #2083. If flush is called while an existing async job was queued or is now running, this would block until it completes. It would be nice to drop this if we don't care about whether that job finishes before returning.

@scottschurr: What is your opinion? Is it important for the flush call to block until the potentially asynchronous call is complete?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wilsonianb and @bachase I think the loop is still useful. This is a thread synchronization point. A different thread may have called doStaleWrite(). If that thread is running when we enter flush() at shutdown, then staleWriting_ will be true (line 89). So we'll skip calling doStaleWrtie() ourselves and we'll let the other thread do our job for us. However when we exit this function all of the stale cache should be flushed to the database. The loop makes sure we don't leave the function until the stale cache is all safely in the database.

I had hoped the comments were sufficient to describe what we're up to. but maybe not. @wilsonianb, do you have a suggested change for the comments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that makes sense to me.
I'd vote for this being clearly conveyed in the comment:

when we exit this function all of the stale cache should be flushed to the database. The loop makes sure we don't leave the function until the stale cache is all safely in the database.

bool shouldRelay = false;

// only add trusted or listed
if (val->isTrusted() || pubKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could just be if (pubKey) especially since we are using pubKey below.

{
JLOG(j.debug()) << "Val for " << hash << " from "
<< toBase58(TokenType::TOKEN_NODE_PUBLIC, signer)
<< " not added UNtrustesd/";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNtrustesd/ --> UNtrusted

@bachase
Copy link
Collaborator Author

bachase commented Apr 24, 2017

Rebased on 0.70.0-b4 and addressed the latest set of comments. I extending the private byLedger and current members of Validations to allow reserving storage prior to iterating items. This does so under the same lock used for iteration. The interface is a tad more mucky, but since these are private helper members, it seems worth the trade-off.

Thanks @wilsonianb and @scottschurr for your careful reviews; the code is definitely better for it.

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I spotted was a comment that might need attention.

I like how current<> requires a lambda to do the reserve(). It means every caller of current<> must at least think about that issue. 👍

@@ -199,11 +202,11 @@ handleNewValidation(Application& app,
bool shouldRelay = false;

// only add trusted or listed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still correct since we're no longer checking val->isTrusted() on the following line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, still true. pubKey is only set if the validation is trusted or listed.

Copy link
Contributor

@wilsonianb wilsonianb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are few fee and amendment files that could include STValidation.h instead of RCLValidations.h
This worked for me
wilsonianb@78eaa98

soci::transaction tr(*db);
for (auto it: vector)
{
s.erase ();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This erase wasn't carried over to doStaleWrite. Do we not need it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!!!!

boost::none,
[&](std::size_t) {}, // nothing to reserve
[&](NodeKey const&, Validation const& v) {
if (v.trusted() && v.isPreviousLedgerID(ledgerID))
Copy link
Contributor

@wilsonianb wilsonianb Apr 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't new and doesn't have to be addressed here, but it seems like the way we handle a validation's previous ledger id could be improved.
The previous ledger id appears to only be set when we replace a validator's last seen current validation with the new one.
https://github.com/ripple/rippled/pull/2084/files#diff-46f83c3cd15dd825b63068c0ab4dea00R423
This means that if we receive a validation from a validator for the first time (or after not seeing one in the past 5 minutes), the previous ledger id would not be set and the validation would not be counted in getNodesAfter or currentTrustedDistribution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think that may be the best we can do though, since we have a gap in the validation stream from that validator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option would be for getNodesAfter and currentTrustedDistribution to take a ledger sequence instead of a LedgerID.
In both cases, we don't care if there was actually a validation for the previous ledger. We only care that a validation is for ledger that follows a particular ledger.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the LedgerID, wouldn't we be counting ledgers that are potentially on a different ledger chain? At least, for currentTrustedDistrubution, I think care about that distinction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would change the assumption about what's being counted in currentTrustedDistrubution. The problem is that the count under the current assumption (same ledger chain) can be inaccurate if we don't set the previous ledger id for a validation. It seems like we have the information to determine that a lone validation validates a ledger id whose sequence is immediately after a given sequence. It would just be a matter of finding the least messy way of figuring that out.

maybeStaleValidation.emplace(std::move(oldVal));
// Replace old val in the map and set the previous ledger ID
ins.first->second = val;
ins.first->second.setPreviousLedgerID(oldID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the rare case that we are replacing an identical validation from a signing revoked key, I think the previous ledger id should actually be set to the previous ledger id of the validation being replaced.
This could be changed to something like:
wilsonianb@fcf2f4f#diff-46f83c3cd15dd825b63068c0ab4dea00R427

Copy link
Collaborator Author

@bachase bachase Apr 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I'll cherry-pick with a modification to ensure we determine oldID beforre moving out the oldVal.

/** Expire old validation sets

Remove validation sets that were accessed more than
VALIDATION_SET_EXPIRES ago.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VALIDATION_SET_EXPIRES --> validationSET_EXPIRES

// This ensures that all validations are written upon return from
// this function.

while (staleWriting_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I understand the purpose of this loop, it could be put in an else


// A new id is needed since the signing key changed
BEAST_EXPECT(
AddOutcome::sameSeq == harness.add(a, Seq{2}, ID{20}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also test adding a ledger with a new signing key and the same sequence and id?

}

{
// Processing validations out of order should ignore the older
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also test the following:

  • Current validation followed by validation with later sequence but earlier sign (and seen?) time
  • Current validation followed by validation with lower sequence (but never before seen id) and later sign (and seen?) time


BEAST_EXPECT(AddOutcome::current == harness.add(a, Seq{1}, ID{1}));

BEAST_EXPECT(harness.stale().empty());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add another call to harness.vals().currentTrusted(); before this check


// Only a is trusted
BEAST_EXPECT(harness.vals().currentTrusted().size() == 1);
BEAST_EXPECT(harness.vals().currentTrusted()[0].ledgerID() == ID{1});
Copy link
Contributor

@wilsonianb wilsonianb Apr 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a different ID is used for b, it is clearer that the trusted validation is from a

Node a = harness.makeNode();

BEAST_EXPECT(AddOutcome::current == harness.add(a, Seq{1}, ID{1}));
BEAST_EXPECT(harness.vals().numTrustedForLedger(ID{1}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding another call to harness.vals().expire() before this check

@bachase
Copy link
Collaborator Author

bachase commented Apr 27, 2017

I will squash and rebase after the final 👍 . I also tested running rippled for a few minutes and looking at the validations in the sqlite database.

@@ -480,6 +486,14 @@ class Validations_test : public beast::unit_test::suite

BEAST_EXPECT(harness.vals().numTrustedForLedger(ID{2}) == 0);
BEAST_EXPECT(harness.vals().numTrustedForLedger(ID{20}) == 1);

// A new key, but re-issue the same validation, since it is the
// same, it is considered a repeat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want this be AddOutcome::current for this case:
#2084 (comment)
We would need to change this check in Validations::add
https://github.com/ripple/rippled/pull/2084/files#diff-46f83c3cd15dd825b63068c0ab4dea00R383
to something like

auto const ret = byLedger_[id].emplace(key, val);
if (! ret.second && ret.first->second.key() == val.key())
    return AddOutcome::repeat;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I should've noticed that too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, there was an additional change for this.

}
{
// Processs validations out of order with shifted times
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Processs --> Process

BEAST_EXPECT(
AddOutcome::current == harness.add(a, Seq{8}, ID{8}));

// Process a validation that has "later" seq but early sign time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting how sign time trumps ledger sequence, but it makes sense.

@bachase
Copy link
Collaborator Author

bachase commented Apr 27, 2017

Rebased and squashed after addressing last set of @wilsonianb's comments.

@scottschurr
Copy link
Collaborator

Build error in CircleCI:

In file included from src/ripple/unity/app_consensus.cpp:21:
In file included from src/ripple/app/consensus/RCLConsensus.cpp:22:
In file included from src/ripple/app/consensus/RCLValidations.h:24:
src/ripple/consensus/Validations.h:422:47: error: no member named
      'insert_or_assign' in 'std::unordered_map<ripple::PublicKey,
      ripple::RCLValidation, beast::uhash<beast::spooky>,
      std::equal_to<ripple::PublicKey>, std::allocator<std::pair<const
      ripple::PublicKey, ripple::RCLValidation> > >'
                                validationMap.insert_or_assign(key, val);
                                ~~~~~~~~~~~~~ ^
src/ripple/app/consensus/RCLValidations.cpp:210:44: note: in instantiation of
      member function 'ripple::Validations<ripple::RCLValidationsPolicy,
      ripple::RCLValidation, std::mutex>::add' requested here
        AddOutcome const res = validations.add(*pubKey, val);
                                           ^

@bachase
Copy link
Collaborator Author

bachase commented Apr 27, 2017

Argg, C++17 support strikes again!

val.key() != oldVal.key())
{
// This is either a newer validation or a new signing key
LedgerID const oldID = [&]() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename oldID as prevID

<< seq;
}

const bool currVal = res == AddOutcome::current;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also want to call checkAccept and relay for sameSeq

  • If a validator publishes a second validation for the same sequence (different ledger id), we should call checkAccept and relay
  • If a validator changes keys and publishes a new validation for the same sequence (same ledger id), we should at least relay. (Calling checkAccept should be inconsequential.)

Copy link
Contributor

@wilsonianb wilsonianb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍
I'm still in favor of ditching ValidationCounts/highNode if @JoelKatz approves.
#2084 (comment)

@bachase
Copy link
Collaborator Author

bachase commented May 1, 2017

Thanks @wilsonianb and @scottschurr for your thorough reviews. This is a much better change now.

I'll consider the ValidationsCounts/highNode change in a separate PR.

@bachase bachase added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label May 1, 2017
@bachase
Copy link
Collaborator Author

bachase commented May 18, 2017

Squashed and rebased on 0.70.0-b6

@bachase bachase added Rebase Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. and removed Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. labels Jun 2, 2017
@bachase
Copy link
Collaborator Author

bachase commented Jun 14, 2017

Rebased onto 0.70.0-b8 and added the changest from #2094 to avoid merge conflicts when putting into a future release.

bachase added 2 commits July 10, 2017 08:55
Introduces a generic Validations class for storing and querying current and
recent validations.  Aditionally migrates the validation related timing
constants from LedgerTiming to the new Validations code.

Introduces RCLValidations as the version of Validations adapted for use in the
RCL.  This adds support for flushing/writing validations to the sqlite log and
also manages concurrent access to the Validations data.

RCLValidations::flush() no longer uses the JobQueue for its database
write at shutdown.  It performs the write directly without
changing threads.
@bachase
Copy link
Collaborator Author

bachase commented Jul 10, 2017

Rebased on 0.70.1

@seelabs
Copy link
Collaborator

seelabs commented Jul 12, 2017

Merged into 0.80.0-b1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants