Validations refactor (RIPD-1412) #2084

bachase · 2017-04-11T16:59:14Z

This PR refactors the Validations class into a generic version that can be adapted for the consensus simulation framework. RCLValidations replaces the current implementation by adapting the generic version to use RCL validations.

The first two changesets are minor refactorings that simplify the generic implementation.
The final changeset cherry-picks @scottschurr's work in #2083 and could use his review.

bachase · 2017-04-11T16:59:23Z

Documentation preview http://bachase.github.io/ripd1412/index.html

codecov-io · 2017-04-11T17:41:31Z

Codecov Report

Merging #2084 into develop will increase coverage by 0.15%.
The diff coverage is 83.9%.

@@             Coverage Diff             @@
##           develop    #2084      +/-   ##
===========================================
+ Coverage    68.98%   69.14%   +0.15%     
===========================================
  Files          684      685       +1     
  Lines        50461    50530      +69     
===========================================
+ Hits         34812    34938     +126     
+ Misses       15649    15592      -57

Impacted Files	Coverage Δ
src/ripple/app/misc/FeeVote.h	`100% <ø> (ø)`	⬆️
src/ripple/app/main/Application.h	`100% <ø> (ø)`	⬆️
src/ripple/consensus/LedgerTiming.h	`95.83% <ø> (ø)`	⬆️
src/ripple/app/misc/AmendmentTable.h	`100% <ø> (ø)`	⬆️
src/ripple/app/misc/NetworkOPs.cpp	`58.7% <0%> (ø)`	⬆️
src/ripple/overlay/impl/PeerImp.cpp	`0% <0%> (ø)`	⬆️
src/ripple/consensus/DisputedTx.h	`81.94% <100%> (ø)`	⬆️
src/ripple/consensus/Consensus.h	`84.89% <100%> (+0.06%)`	⬆️
src/ripple/app/misc/impl/AmendmentTable.cpp	`92.78% <100%> (ø)`	⬆️
src/ripple/consensus/Consensus.cpp	`82.69% <100%> (ø)`
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3bfd9de...1438ae3. Read the comment docs.

scottschurr

I'm fairly confident there's a locking issue that needs attention in RCLValidations::onStale(). Other than that the comments are mostly advisory and my own personal opinions.

scottschurr · 2017-04-11T20:56:11Z

src/ripple/app/misc/Validations.cpp

    }

-    std::shared_ptr<ValidationSet> findSet (uint256 const& ledgerHash)
+    ValidationSet * findSet (uint256 const& ledgerHash)


For both findCreateSet and findSet I wonder if there's some risk that a returned ValidationSet might be close to the end of its 10 minute life span. If the time of most recent reference is more important than time-since-creation you could consider calling touch() on the returned reference. That will update the time on the returned reference and move it to the front of the chronological list.

If life span is not a concern then it might be nice to add a comment about why we shouldn't worry about it. Thanks.

Correct, this was temporary while I refactored. In the new version of the code in Validations, the equivalent lookup into the aged map still returns a ValidationSet *, but no public methods can return that set so the lifetime is guaranteed.

As you might be thinking, that is only "enforced" by convention and a soon-to-be-added comment. An alternative I considered was to completely restrict access to the map and only allow iteration over it via a functor.

scottschurr · 2017-04-11T21:45:08Z

src/ripple/app/misc/Validations.cpp

    {
-        return mValidations.fetch (ledgerHash);
+        auto it = mValidations.find(ledgerHash);


auto const?

scottschurr · 2017-04-11T21:46:07Z

src/ripple/app/misc/Validations.cpp

                return false;

-            auto it = mCurrentValidations.find (*pubKey);
+            auto res = mCurrentValidations.emplace(*pubKey,val);


auto const?

scottschurr · 2017-04-11T21:48:05Z

src/ripple/app/misc/Validations.cpp

+    {
+        std::vector <NetClock::time_point> times;
+        ScopedLockType sl (mLock);
+        if (auto j = findSet (hash))


auto const?

scottschurr · 2017-04-11T23:27:42Z

src/ripple/app/consensus/RCLValidations.cpp

+        ScopedLockType sl(lock_);
+        auto res = Validations::add(now(), *pubKey, val);
+
+        // This is a duplicate validadtion


validadtion -> validation

scottschurr · 2017-04-12T19:11:50Z

src/test/consensus/Validations_test.cpp

+
+        {
+            {
+                auto v = a.validation(Seq{1}, ID{1});


auto const?

scottschurr · 2017-04-12T20:07:07Z

src/test/consensus/Validations_test.cpp

+                auto val3 = a.validation(Seq{3}, ID{3});
+
+                clock.advance(4s);
+                auto val4 = a.validation(Seq{4}, ID{4});


auto const here?

scottschurr · 2017-04-12T20:08:04Z

src/test/consensus/Validations_test.cpp

+        clock.advance(p.VALIDATION_CURRENT_LOCAL);
+
+        // trigger iteration over current
+        vals.current(a.now(), [](auto const&) {});


scottschurr · 2017-04-12T22:01:55Z

src/ripple/app/consensus/RCLValidations.cpp


-    Validations::flush();
+        Validations::flush([&](RCLValidation && v)


Wince. When possible I try not to call outside functions with a lock held. This code base doesn't hold to that rule very well, so I'll leave this for your consideration. But at this point the only place we need the ScopedLockType is for the staleValidations_.emplace_back(std::move(v.val_)); inside the upcoming lambda.

Consider putting a lock inside the lambda (yes, the lock will thrash more) and postpone this ScopedLockType to just before the if (anyNew && !writing_) a few lines down.

If I move the lock inside, do I need to consider concurrent calls to RCLValidations::flush? My understanding is that the lock protects all data members of the class, both in the old and new code. If another call to RCLValidations::flush occurs, wouldn't this cause a data race?

As long as I document the semantics of flush, I'm not too concerned about calling it while holding the lock and would prefer to avoid thrashing. If we are comfortable calling the external code in .emplace_back while holding a lock, I think extending that comfort to the flush function is acceptable. Does that seem reasonable or would you prefer I move the lock inside?

You may be looking at the code more closely than I am, but I'm not seeing any data races if the lock is inside the lambda. anyNew is a local, so it is not subject to thread races.

If we put the lock inside the lambda, we also need to add a lock just in front of line 268 to protect the rest of the function. If that's what you're worried about then I agree. We need a lock there as well.

In terms of what is "outside code" my personal approach is to have these buckets:

std library code.

non-std library, but well vetted, library code (like boost and beast).

code in this same class.

everything else (including other functions/classes in this code base).

Most calls into std library code I don't worry about. The characteristics of that code are stable and well understood. I get more nervous as I go further down the list because the code is less well characterized and more volatile. Note that, in my opinion, calling into our own code outside this class is the scariest. It is the most subject to change and we have the smallest understanding of the consequences of these changes.

Let me provide some more background for my concern. When I hold a lock and then call an outside (non-std library) function I am, in effect, saying two things:

It is okay if I hold this lock indefinitely, since I don't know how long this lock will be held.

I strongly believe that there will be no lock order problems introduced by the outside code trying to acquire another lock or to re-acquire this lock.

I believe that you've checked all of the code that you're modifying at this moment. I'm sure it's all good right now.

My primary concern is that, as the outside code is maintained, those maintainers have no idea that this lock is being held. So problems like the ones I just described are easy to introduce during maintenance.

As I said, the existing code base is pretty bad (in my opinion) about holding locks over external calls. And sometimes holding a lock over an external call is just flat-out unavoidable. So you'll not be doing anything that isn't seen elsewhere in the code base. I suspect in this case it is avoidable (at the cost of some extra lock thrash). I'm just pointing it out for your consideration.

You know best how you should manage the lock, and I leave that to your discretion. I'm sure you'll make the best choice for the code base. Thanks for listening.

scottschurr · 2017-04-12T22:04:53Z

src/ripple/app/consensus/RCLValidations.cpp

@@ -240,24 +240,45 @@ RCLValidations::onStale(RCLValidation&& v)
        return;


I'm worried about RCLValidations::onStale(). In the old implementation the moral equivalent of onStale() was always called with RCLValidations::lock_ held by a std::lock_guard. Now all of the calls to onStale are coming from outside RCLValidations. So I'm fairly confident the callers don't hold the lock_.

Any data members of RCLValidations that are accessed by RCLValidations::doWrite() need to always be protected by lock_. That would include both staleValidations_ and writing_.

It may be sufficient to simply grab the lock_ at the very top of onStale(). I'm unsure because I don't know the requirements of all of the places that call onStale(). The result might look like this:

{ ScopedLockType sl(lock_); staleValidations_.emplace_back(std::move(v.val_)); if (writing_) return; writing_ = true; } app_.getJobQueue().addJob(...

Once the locking situation is cleared up I'll look hard at the code again.

The intent is that onStale is only accessible and callable from the generic Validations code and any calls into the Validations code from RCLConsensus require holding the lock for the duration of the call to the base class. The fact that this is opaque is a shortcoming of the CRTP design. I'll try to think of a way to make it clearer outside of just commenting. I'm open to suggestions too =).

bachase · 2017-04-13T20:34:05Z

@scottschurr has brought up some good points that might warrant switching away from the CRTP design. I'll take a day or two to see if it will stick and will potentially re-submit the review if it makes more sense. Feel free to put this on the back-burner until I comment further.

I will note that I was partial to CRTP since I used it in the Consensus refactor. However, there was much more algorithmic specialization there. This class has a single onStale callback so the higher degree of coupling is (probably) not necessary.

wilsonianb

Aditionally --> Additionally in the Add generic Validations commit message

wilsonianb · 2017-04-14T19:58:27Z

src/ripple/consensus/Validations.h

+        // Signing key of node that published the validation
+        NodeKey key() const;
+
+        // Identifier of node that published the validaton


validaton -> validation

wilsonianb · 2017-04-14T20:00:53Z

src/ripple/consensus/Validations.h

+    {
+        //! The number of trusted validations
+        std::size_t count;
+        //! The highest trusting node ID


trusting -> trusted?

wilsonianb · 2017-04-14T20:38:08Z

src/ripple/consensus/Validations.h

+        if a time is specified
+
+        @param t (Optional) Time used to determine staleness
+        @param f Callable with signature (NodeKey const &, Validations const &)


Any idea why this displays with |6| in the docs?
Callable with signature (NodeKey const &, |6|Validations const &)
https://bachase.github.io/ripd1412/rippled/ref/ripple__Validations/current.html

It had come up in the past, it seems to be a bug in how docca renders some templates.

wilsonianb · 2017-04-14T20:44:59Z

src/ripple/consensus/Validations.h

+public:
+    /** Constructor
+
+        @param p ValidationParms to control staleness/expiration of validaitons


Another |6| in the docs
https://bachase.github.io/ripd1412/rippled/ref/ripple__Validations/Validations.html

wilsonianb · 2017-04-14T20:47:30Z

src/ripple/consensus/Validations.h

+        current,  //< This was a new validation and was added
+        repeat,   //< Already had this validation
+        stale,    //< Not current or was older than current from this node
+        sameSeq,  //< Had a validation with same sequence number


These descriptions aren't appearing in the docs
https://bachase.github.io/ripd1412/rippled/ref/ripple__Validations/AddOutcome.html

wilsonianb · 2017-04-17T17:13:02Z

src/ripple/app/consensus/RCLValidations.h

+
+/** Wrapper over STValidation for generic Validation code
+
+    Wraps an STValidation::pointer for compatiblity with the generic validation


compatiblity --> compatibility

wilsonianb · 2017-04-17T17:45:02Z

src/test/consensus/Validations_test.cpp

+                auto val3 = a.validation(Seq{3}, ID{3});
+
+                clock.advance(4s);
+                auto val4 = a.validation(Seq{4}, ID{4});


Consider using Seq{3} here also.

wilsonianb · 2017-04-17T18:04:33Z

src/test/consensus/Validations_test.cpp

+        // first round a,b,c agree
+        for(auto const & node : {a,b,c})
+            BEAST_EXPECT(AddOutcome::current == vals.add(node, Seq{1}, ID{1}));
+


You could add a trusted node d with validations for different ledger ids.

wilsonianb · 2017-04-17T18:12:31Z

src/ripple/consensus/Validations.h

+        LedgerID const& currentLedger,
+        LedgerID const& priorLedger,
+        std::size_t cutoffBefore,
+        beast::Journal& j)


If we got rid of the one trace log in this function, Validations.h wouldn't need to include <ripple/beast/utility/Journal.h>

wilsonianb · 2017-04-17T18:18:09Z

src/test/consensus/Validations_test.cpp

+        }
+
+        {
+            // Cutoff old sequence numberss


numberss --> numbers

bachase · 2017-04-18T15:05:48Z

I just pushed two new changesets. The first addresses many of the stylistic and auto-abuse issues @scottschurr identified. The second switches from CRTP to a policy design and addresses issues @wilsonianb identified.

I decided to have the generic Validations class own the instance of the policy as a private member. I believe this helps indicate that the policy class is completely subservient to the Validations instance. The RCLValidationsPolicy owns the stale validations and has its own mutex for managing access. The Validations class has its own mutex for managing access to its members--the current and byLedger Validations. I've avoided calling into the policy instance while holding a lock, save for one onStale call which I believe is reasonable to avoid some copies.

I kept the history of commits for reviewers, but will gladly squash if that is easier.

bachase · 2017-04-18T15:06:51Z

src/ripple/app/consensus/RCLValidations.cpp

+    bool shouldRelay = false;
+
+    // only add trusted or listed
+    if (val->isTrusted() || pubKey)


@wilsonianb This was my approach to eliminating the extra isCurrent call. Can you spot check it?

I think this could just be if (pubKey) especially since we are using pubKey below.

bachase · 2017-04-18T15:15:59Z

Updated documentation preview @ http://bachase.github.io/ripd1412/index.html

wilsonianb · 2017-04-18T15:37:37Z

I'm getting these compile warnings with the new commits

In file included from src/test/consensus/Validations_test.cpp:22:0,
                 from src/test/unity/consensus_test_unity.cpp:22:
src/ripple/consensus/Validations.h: In instantiation of 'ripple::Validations<StalePolicy, Validation, MutexType>::Validations(const ripple::ValidationParms&, beast::abstract_clock<std::chrono::_V2::steady_clock>&, beast::Journal, Ts&& ...) [with Ts = {ripple::test::Validations_test::StaleData&, beast::manual_clock<std::chrono::_V2::steady_clock>&}; StalePolicy = ripple::test::Validations_test::StalePolicy; Validation = ripple::test::Validations_test::Validation; MutexType = ripple::test::Validations_test::DummyMutex]':
src/test/consensus/Validations_test.cpp:380:63:   required from here
src/ripple/consensus/Validations.h:227:20: warning: 'ripple::Validations<ripple::test::Validations_test::StalePolicy, ripple::test::Validations_test::Validation, ripple::test::Validations_test::DummyMutex>::j_' will be initialized after [-Wreorder]
     beast::Journal j_;
                    ^
src/ripple/consensus/Validations.h:225:17: warning:   'ripple::test::Validations_test::StalePolicy ripple::Validations<ripple::test::Validations_test::StalePolicy, ripple::test::Validations_test::Validation, ripple::test::Validations_test::DummyMutex>::stalePolicy_' [-Wreorder]
     StalePolicy stalePolicy_;
                 ^
src/ripple/consensus/Validations.h:303:5: warning:   when initialized here [-Wreorder]
     Validations(
     ^

scottschurr · 2017-04-18T16:16:29Z

@bchase: Seeing this error on Travis build 9129.4:

src/test/consensus/Validations_test.cpp:892:34: error: no matching literal operator for call to 'operator""s' with argument of type 'unsigned long long' or 'const char *', and no matching literal operator template
        harness.clock().advance(1s);
                                 ^
1 error generated.

Are you missing a using namespace std::chrono_literals;?

scottschurr

Personally, I like the way the the policy-oriented design worked out. The encapsulation seemed like it worked well. I left a few comments, but nothing that I think desperately needs fixing. Thanks for including me in the review. I learned some stuff.

scottschurr · 2017-04-19T17:21:06Z

src/ripple/app/consensus/RCLValidations.cpp

+    // allocated to validators that were listed by publishers we trusted but
+    // that we didn't choose to trust. The shorter term plan was just to forward
+    // untrusted validations if peers wanted them or if we had the
+    // ability/bandwidth to. None of that was implemented.


Thanks for adding the comments by @JoelKatx.

Here's a note, not a request for a change. When I see a FIXME I envision a day where someone sets loose through our source code to address all the FIXMEs. They would either fix the issue (and remove the comment) or just remove the comment (because we decide it's not really a problem). A quick grep shows there are at least 31 FIXME comments between src/ripple and src/test. If you add in the VFALCO and NIKB TODOs and NOTEs you end up with over 300 items.

For our intrepid code warrior to be successful at that mission, each FIXME comment needs to clearly identify what the problem is and, maybe not the fix, but at least the imagined desirable outcome. For me, at least, if I were to charge out on that FIXME mission, I wouldn't know what to do with this comment because I wouldn't know which of the unimplemented stuff really needed to be implemented or why. We may not have that level of information for this comment. But, if we do, it would be nice to include it.

scottschurr · 2017-04-19T19:16:13Z

src/ripple/consensus/Validations.h

        @param ledgerID The identifier of the ledger
-        @return A pointer to the set of validations, nullptr if none exist.
+        @param f Callable with signature (NodeKey const &, Validation const &)


On the C++ trivia front, @HowardHinnant: I vaguely remember that for C++20(?) folks are headed toward referring to things that support INVOKE as Invocable rather than as Callable? For example, in C++17 there's an is_invocable type trait... Not that anything needs to be done here.

scottschurr · 2017-04-19T22:06:55Z

src/ripple/consensus/Validations.h

-            auto const oldSeq = oldVal.seq();
-            auto const newSeq = val.seq();
+            std::uint32_t const oldSeq = oldVal.seq();
+            std::uint32_t const newSeq = val.seq();


To test for narrowing conversions I changed this from an assignment-style initialization to a braced initialization. This is cool because you get a compilation failure on a narrowing conversion. Low and behold, if I use this for initialization...

std::uint32_t const oldSeq {oldVal.seq()};

I get a compile error out of RCLValidations.cpp:

In file included from src/ripple/app/consensus/RCLValidations.cpp:21: In file included from src/ripple/app/consensus/RCLValidations.h:24: src/ripple/consensus/Validations.h:384:45: error: non-constant-expression cannot be narrowed from type 'std::size_t' (aka 'unsigned long') to 'std::uint32_t' (aka 'unsigned int') in initializer list [-Wc++11-narrowing] std::uint32_t const oldSeq {oldVal.seq()};

Some people might take that as a sign we should be using auto instead of an explicit type. My personal take on it is that the uint32_t declaration is dead on (we know that's the type of the sequence), and we should be getting more aggressive about using brace initialization instead of assignment-style initialization. That way the compiler will find our errors for us.

I'm not going to request any changes as a result of this experiment. I'm sure the code base is rife with this sort of problem. But I thought it was an interesting lesson. Personally I'm going to be putting more emphasis on brace initialization (but don't combine it with auto!) in my own coding style after running this experiment.

Thanks for catching this! I intended to make seq std::uint32_t everywhere and had accidentally left it as a std::size_t in RCLValidation.

scottschurr · 2017-04-19T22:17:47Z

src/ripple/app/main/Application.h

+class RCLValidation;
+class RCLValidationsPolicy;
+using RCLValidations =
+    Validations<RCLValidationsPolicy, RCLValidation, std::mutex>;


Nice forward declaration.

scottschurr · 2017-04-19T22:56:09Z

src/ripple/consensus/Validations.h

    using LedgerID =
        decay_result_t<decltype (&Validation::ledgerID)(Validation)>;
    using NodeKey = decay_result_t<decltype (&Validation::key)(Validation)>;
    using NodeID = decay_result_t<decltype (&Validation::nodeID)(Validation)>;
    using ValidationMap = hash_map<NodeKey, Validation>;

+    using ScopedLock = std::lock_guard<MutexType>;
+


For extra credit you could add some static_asserts for the methods that StalePolicy and Validation need to provide. But not required...

scottschurr · 2017-04-20T00:15:05Z

src/test/consensus/Validations_test.cpp

            }

            {
                // Test the node changing signing key, then reissuing a ledger

                // Confirm old ledger on hand, but not new ledger
-                BEAST_EXPECT(vals.exists(ID{2}));
-                BEAST_EXPECT(!vals.exists(ID{20}));
+                BEAST_EXPECT(harness.vals().numTrustedForLedger(ID{2}));


FWIW, this test will pass if numTrustedForLedger() returns 2...

scottschurr · 2017-04-20T01:03:49Z

src/ripple/app/consensus/RCLValidations.cpp

 // ScopedLockType& acts as a reminder to future maintainers.
 void
-RCLValidations::doWrite(ScopedLockType&)
+RCLValidationsPolicy::doStaleWrite(ScopedLockType&)
 {
    std::string insVal(


insVal and findSeq could be static const

scottschurr · 2017-04-20T01:06:33Z

src/ripple/app/consensus/RCLValidations.cpp

-            writing_ = true;
-            doWrite(sl);
+            staleWriting_ = true;
+            doStaleWrite(sl);
        }

        // Handle the case where flush() is called while a queuedWrite


... while a doStaleWrite() is...?

scottschurr · 2017-04-20T01:09:52Z

src/ripple/app/consensus/RCLValidations.cpp

 }

-// NOTE: doWrite() must be called with mLock *locked*.  The passed
+// NOTE: doStaleWrite() must be called with mLock *locked*.  The passed


mLock -> staleLock_?

scottschurr · 2017-04-20T01:11:02Z

src/ripple/app/consensus/RCLValidations.cpp

+    {
+        using AddOutcome = RCLValidations::AddOutcome;
+
+        AddOutcome res = validations.add(*pubKey, val);


const res?

wilsonianb · 2017-04-18T15:50:50Z

src/ripple/app/consensus/RCLValidations.h

+
+};
+
+/** Implements the StalePolicy policy class for adapating Validations in the RCL


adapating --> adapting

wilsonianb · 2017-04-18T15:54:49Z

src/ripple/app/consensus/RCLValidations.h

+     /** Flush current validations to disk before shutdown.
+
+         @param remaining The remaining validations to flush
+      */


Comment is misaligned

I'm having trouble spotting it on my end. Which part is misaligned?

https://github.com/bachase/rippled/blob/2f0b7e02eb36dbcda73fed871d33f098483054d1/src/ripple/app/consensus/RCLValidations.h#L175-L179

wilsonianb · 2017-04-18T16:00:07Z

src/ripple/app/consensus/RCLValidations.h

+        return val_->getLedgerHash();
+    }
+
+    /// Validated ledger's sequence number (0 if none)


Can we test that this returns 0 if the sequence is missing?

This could be changed to
return val_->isFieldPresent(sfLedgerSequence) ? val_->getFieldU32(sfLedgerSequence) : 0;

I can make it more explicit, but I believe that getFieldByValue will already to that https://github.com/ripple/rippled/blob/develop/src/ripple/protocol/STObject.h#L573

I see I wasn't consistent with that for loadFee. I'll add the explicit check.

wilsonianb · 2017-04-18T16:24:41Z

src/ripple/app/consensus/RCLValidations.h

+        return val_->getFieldU32(sfLedgerSequence);
+    }
+
+    /// Validation ledger's signing time


You could change this to
/// Validation signing time

wilsonianb · 2017-04-18T16:36:36Z

src/ripple/app/consensus/RCLValidations.h

+        return val_->isTrusted();
+    }
+
+    /// Set the prior ledger hash this validation is replacing


replacing --> following

wilsonianb · 2017-04-18T16:48:11Z

src/ripple/app/consensus/RCLValidations.h

+    2. Add the validation to the set of validations if current.
+    3. If new and trusted, send the validation to the ledgerMaster.
+
+    @param app Application object containing validations and ledgerMesger


ledgerMesger --> ledgerMaster

wilsonianb · 2017-04-18T16:52:49Z

src/ripple/consensus/Validations.h

+
+    beast::Journal j_;
+
+    //! StalePolicy details providing now(),onStale() and flush callbacks


now(),onStale() --> now(), onStale()

wilsonianb · 2017-04-18T20:48:23Z

src/ripple/app/consensus/RCLValidations.cpp

+        }
+
+        // Handle the case where flush() is called while a queuedWrite
+        // is already in progress.


I don't think this loop is necessary anymore.
Previously, it acted to make flush synchronous since the write job was queued, but now we call doStaleWrite directly.
https://github.com/ripple/rippled/blob/develop/src/ripple/app/misc/Validations.cpp#L443-L463

This was a port of the changes in #2083. If flush is called while an existing async job was queued or is now running, this would block until it completes. It would be nice to drop this if we don't care about whether that job finishes before returning.

@scottschurr: What is your opinion? Is it important for the flush call to block until the potentially asynchronous call is complete?

@wilsonianb and @bachase I think the loop is still useful. This is a thread synchronization point. A different thread may have called doStaleWrite(). If that thread is running when we enter flush() at shutdown, then staleWriting_ will be true (line 89). So we'll skip calling doStaleWrtie() ourselves and we'll let the other thread do our job for us. However when we exit this function all of the stale cache should be flushed to the database. The loop makes sure we don't leave the function until the stale cache is all safely in the database.

I had hoped the comments were sufficient to describe what we're up to. but maybe not. @wilsonianb, do you have a suggested change for the comments?

ok, that makes sense to me.
I'd vote for this being clearly conveyed in the comment:

when we exit this function all of the stale cache should be flushed to the database. The loop makes sure we don't leave the function until the stale cache is all safely in the database.

wilsonianb · 2017-04-22T02:43:03Z

src/ripple/app/consensus/RCLValidations.cpp

+    bool shouldRelay = false;
+
+    // only add trusted or listed
+    if (val->isTrusted() || pubKey)


I think this could just be if (pubKey) especially since we are using pubKey below.

wilsonianb · 2017-04-22T03:07:25Z

src/ripple/app/consensus/RCLValidations.cpp

+    {
+        JLOG(j.debug()) << "Val for " << hash << " from "
+                    << toBase58(TokenType::TOKEN_NODE_PUBLIC, signer)
+                    << " not added UNtrustesd/";


UNtrustesd/ --> UNtrusted

bachase · 2017-04-24T18:12:55Z

Rebased on 0.70.0-b4 and addressed the latest set of comments. I extending the private byLedger and current members of Validations to allow reserving storage prior to iterating items. This does so under the same lock used for iteration. The interface is a tad more mucky, but since these are private helper members, it seems worth the trade-off.

Thanks @wilsonianb and @scottschurr for your careful reviews; the code is definitely better for it.

scottschurr

All I spotted was a comment that might need attention.

I like how current<> requires a lambda to do the reserve(). It means every caller of current<> must at least think about that issue. 👍

scottschurr · 2017-04-25T16:12:12Z

src/ripple/app/consensus/RCLValidations.cpp

@@ -199,11 +202,11 @@ handleNewValidation(Application& app,
    bool shouldRelay = false;

    // only add trusted or listed


Is this comment still correct since we're no longer checking val->isTrusted() on the following line?

Yes, still true. pubKey is only set if the validation is trusted or listed.

wilsonianb

There are few fee and amendment files that could include STValidation.h instead of RCLValidations.h
This worked for me
wilsonianb@78eaa98

wilsonianb · 2017-04-26T20:55:25Z

src/ripple/app/misc/Validations.cpp

-                    soci::transaction tr(*db);
-                    for (auto it: vector)
-                    {
-                        s.erase ();


This erase wasn't carried over to doStaleWrite. Do we not need it?

Nice catch!!!!

wilsonianb · 2017-04-26T22:08:45Z

src/ripple/consensus/Validations.h

+            boost::none,
+            [&](std::size_t) {}, // nothing to reserve
+            [&](NodeKey const&, Validation const& v) {
+                if (v.trusted() && v.isPreviousLedgerID(ledgerID))


This isn't new and doesn't have to be addressed here, but it seems like the way we handle a validation's previous ledger id could be improved.
The previous ledger id appears to only be set when we replace a validator's last seen current validation with the new one.
https://github.com/ripple/rippled/pull/2084/files#diff-46f83c3cd15dd825b63068c0ab4dea00R423
This means that if we receive a validation from a validator for the first time (or after not seeing one in the past 5 minutes), the previous ledger id would not be set and the validation would not be counted in getNodesAfter or currentTrustedDistribution.

Good point. I think that may be the best we can do though, since we have a gap in the validation stream from that validator?

One option would be for getNodesAfter and currentTrustedDistribution to take a ledger sequence instead of a LedgerID.
In both cases, we don't care if there was actually a validation for the previous ledger. We only care that a validation is for ledger that follows a particular ledger.

Without the LedgerID, wouldn't we be counting ledgers that are potentially on a different ledger chain? At least, for currentTrustedDistrubution, I think care about that distinction.

Yeah, that would change the assumption about what's being counted in currentTrustedDistrubution. The problem is that the count under the current assumption (same ledger chain) can be inaccurate if we don't set the previous ledger id for a validation. It seems like we have the information to determine that a lone validation validates a ledger id whose sequence is immediately after a given sequence. It would just be a matter of finding the least messy way of figuring that out.

wilsonianb · 2017-04-26T23:02:40Z

src/ripple/consensus/Validations.h

+                    maybeStaleValidation.emplace(std::move(oldVal));
+                    // Replace old val in the map and set the previous ledger ID
+                    ins.first->second = val;
+                    ins.first->second.setPreviousLedgerID(oldID);


In the rare case that we are replacing an identical validation from a signing revoked key, I think the previous ledger id should actually be set to the previous ledger id of the validation being replaced.
This could be changed to something like:
wilsonianb@fcf2f4f#diff-46f83c3cd15dd825b63068c0ab4dea00R427

That makes sense. I'll cherry-pick with a modification to ensure we determine oldID beforre moving out the oldVal.

wilsonianb · 2017-04-26T23:06:09Z

src/ripple/consensus/Validations.h

+    /** Expire old validation sets
+
+        Remove validation sets that were accessed more than
+        VALIDATION_SET_EXPIRES ago.


VALIDATION_SET_EXPIRES --> validationSET_EXPIRES

wilsonianb · 2017-04-26T23:54:53Z

src/ripple/app/consensus/RCLValidations.cpp

+        // This ensures that all validations are written upon return from
+        // this function.
+
+        while (staleWriting_)


Now that I understand the purpose of this loop, it could be put in an else

wilsonianb · 2017-04-27T03:43:58Z

src/test/consensus/Validations_test.cpp

+
+                // A new id is needed since the signing key changed
+                BEAST_EXPECT(
+                    AddOutcome::sameSeq == harness.add(a, Seq{2}, ID{20}));


Could you also test adding a ledger with a new signing key and the same sequence and id?

wilsonianb · 2017-04-27T03:48:29Z

src/test/consensus/Validations_test.cpp

+            }
+
+            {
+                // Processing validations out of order should ignore the older


Could you also test the following:

Current validation followed by validation with later sequence but earlier sign (and seen?) time

Current validation followed by validation with lower sequence (but never before seen id) and later sign (and seen?) time

wilsonianb · 2017-04-27T03:52:55Z

src/test/consensus/Validations_test.cpp

+
+        BEAST_EXPECT(AddOutcome::current == harness.add(a, Seq{1}, ID{1}));
+
+        BEAST_EXPECT(harness.stale().empty());


Maybe add another call to harness.vals().currentTrusted(); before this check

wilsonianb · 2017-04-27T04:00:40Z

src/test/consensus/Validations_test.cpp

+
+        // Only a is trusted
+        BEAST_EXPECT(harness.vals().currentTrusted().size() == 1);
+        BEAST_EXPECT(harness.vals().currentTrusted()[0].ledgerID() == ID{1});


If a different ID is used for b, it is clearer that the trusted validation is from a

wilsonianb · 2017-04-27T04:19:47Z

src/test/consensus/Validations_test.cpp

+        Node a = harness.makeNode();
+
+        BEAST_EXPECT(AddOutcome::current == harness.add(a, Seq{1}, ID{1}));
+        BEAST_EXPECT(harness.vals().numTrustedForLedger(ID{1}));


Consider adding another call to harness.vals().expire() before this check

bachase · 2017-04-27T15:35:59Z

I will squash and rebase after the final 👍 . I also tested running rippled for a few minutes and looking at the validations in the sqlite database.

wilsonianb · 2017-04-27T16:03:55Z

src/test/consensus/Validations_test.cpp

@@ -480,6 +486,14 @@ class Validations_test : public beast::unit_test::suite

                BEAST_EXPECT(harness.vals().numTrustedForLedger(ID{2}) == 0);
                BEAST_EXPECT(harness.vals().numTrustedForLedger(ID{20}) == 1);
+
+                // A new key, but re-issue the same validation, since it is the
+                // same, it is considered a repeat


I think we want this be AddOutcome::current for this case:
#2084 (comment)
We would need to change this check in Validations::add
https://github.com/ripple/rippled/pull/2084/files#diff-46f83c3cd15dd825b63068c0ab4dea00R383
to something like

auto const ret = byLedger_[id].emplace(key, val); if (! ret.second && ret.first->second.key() == val.key()) return AddOutcome::repeat;

Good call. I should've noticed that too.

FYI, there was an additional change for this.

wilsonianb · 2017-04-27T16:05:04Z

src/test/consensus/Validations_test.cpp

            }
+            {
+                // Processs validations out of order with shifted times


Processs --> Process

wilsonianb · 2017-04-27T16:14:05Z

src/test/consensus/Validations_test.cpp

+                BEAST_EXPECT(
+                    AddOutcome::current == harness.add(a, Seq{8}, ID{8}));
+
+                // Process a validation that has "later" seq but early sign time


Interesting how sign time trumps ledger sequence, but it makes sense.

bachase · 2017-04-27T18:06:41Z

Rebased and squashed after addressing last set of @wilsonianb's comments.

scottschurr · 2017-04-27T18:14:30Z

Build error in CircleCI:

In file included from src/ripple/unity/app_consensus.cpp:21:
In file included from src/ripple/app/consensus/RCLConsensus.cpp:22:
In file included from src/ripple/app/consensus/RCLValidations.h:24:
src/ripple/consensus/Validations.h:422:47: error: no member named
      'insert_or_assign' in 'std::unordered_map<ripple::PublicKey,
      ripple::RCLValidation, beast::uhash<beast::spooky>,
      std::equal_to<ripple::PublicKey>, std::allocator<std::pair<const
      ripple::PublicKey, ripple::RCLValidation> > >'
                                validationMap.insert_or_assign(key, val);
                                ~~~~~~~~~~~~~ ^
src/ripple/app/consensus/RCLValidations.cpp:210:44: note: in instantiation of
      member function 'ripple::Validations<ripple::RCLValidationsPolicy,
      ripple::RCLValidation, std::mutex>::add' requested here
        AddOutcome const res = validations.add(*pubKey, val);
                                           ^

bachase · 2017-04-27T18:17:09Z

Argg, C++17 support strikes again!

wilsonianb · 2017-04-27T19:25:53Z

src/ripple/consensus/Validations.h

+                    val.key() != oldVal.key())
+                {
+                    // This is either a newer validation or a new signing key
+                    LedgerID const oldID = [&]() {


Maybe rename oldID as prevID

wilsonianb · 2017-04-27T19:49:01Z

src/ripple/app/consensus/RCLValidations.cpp

+                           << seq;
+        }
+
+        const bool currVal = res == AddOutcome::current;


I think we also want to call checkAccept and relay for sameSeq

If a validator publishes a second validation for the same sequence (different ledger id), we should call checkAccept and relay

If a validator changes keys and publishes a new validation for the same sequence (same ledger id), we should at least relay. (Calling checkAccept should be inconsequential.)

wilsonianb

LGTM 👍
I'm still in favor of ditching ValidationCounts/highNode if @JoelKatz approves.
#2084 (comment)

bachase · 2017-05-01T12:45:25Z

Thanks @wilsonianb and @scottschurr for your thorough reviews. This is a much better change now.

I'll consider the ValidationsCounts/highNode change in a separate PR.

bachase · 2017-05-18T13:27:57Z

Squashed and rebased on 0.70.0-b6

bachase · 2017-06-14T18:19:20Z

Rebased onto 0.70.0-b8 and added the changest from #2094 to avoid merge conflicts when putting into a future release.

Introduces a generic Validations class for storing and querying current and recent validations. Aditionally migrates the validation related timing constants from LedgerTiming to the new Validations code. Introduces RCLValidations as the version of Validations adapted for use in the RCL. This adds support for flushing/writing validations to the sqlite log and also manages concurrent access to the Validations data. RCLValidations::flush() no longer uses the JobQueue for its database write at shutdown. It performs the write directly without changing threads.

bachase · 2017-07-10T13:29:50Z

Rebased on 0.70.1

seelabs · 2017-07-12T14:31:44Z

Merged into 0.80.0-b1

bachase assigned JoelKatz, wilsonianb and scottschurr Apr 11, 2017

bachase requested review from JoelKatz, wilsonianb and scottschurr April 11, 2017 16:59

bachase force-pushed the validations branch from 2bb2685 to 8cd38b7 Compare April 11, 2017 17:01

scottschurr suggested changes Apr 12, 2017

View reviewed changes

wilsonianb suggested changes Apr 17, 2017

View reviewed changes

bachase commented Apr 18, 2017

View reviewed changes

scottschurr approved these changes Apr 20, 2017

View reviewed changes

wilsonianb suggested changes Apr 24, 2017

View reviewed changes

scottschurr mentioned this pull request Apr 24, 2017

Validations::flush() doesn't use JobQueue (RIPD-1356): #2083

Closed

bachase force-pushed the validations branch from 474d4f4 to 50184ce Compare April 24, 2017 18:09

scottschurr approved these changes Apr 25, 2017

View reviewed changes

bachase mentioned this pull request Apr 25, 2017

Expose consensus parameters for simulation (RIPD-1355) #2094

Closed

wilsonianb reviewed Apr 27, 2017

View reviewed changes

bachase force-pushed the validations branch from 70ac27c to 231984b Compare April 27, 2017 18:05

bachase force-pushed the validations branch from 231984b to 9f0bc05 Compare April 27, 2017 18:28

wilsonianb reviewed Apr 27, 2017

View reviewed changes

wilsonianb approved these changes Apr 28, 2017

View reviewed changes

bachase added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label May 1, 2017

bachase force-pushed the validations branch from f7746a2 to a36369c Compare May 18, 2017 13:27

bachase added Rebase Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. and removed Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. labels Jun 2, 2017

bachase force-pushed the validations branch from a36369c to 0eb4716 Compare June 14, 2017 18:17

bachase removed the Rebase label Jun 14, 2017

bachase force-pushed the validations branch from 0eb4716 to 6ff2313 Compare June 22, 2017 16:12

bachase added 2 commits July 10, 2017 08:55

Expose consensus parameters for simulation (RIPD-1355)

1438ae3

bachase force-pushed the validations branch from 6ff2313 to 1438ae3 Compare July 10, 2017 13:29

seelabs closed this Jul 12, 2017

wilsonianb mentioned this pull request Jul 13, 2017

Use ledger hash to break ties (RIPD-1468) #2169

Closed

bachase mentioned this pull request Dec 27, 2017

Prefer ledger by branch #2300

Closed

wilsonianb mentioned this pull request Jul 5, 2018

Remove TODO in DisputedTx.h about renaming constants for ConsensusParms #2614

Closed


		Validations::flush();
		Validations::flush([&](RCLValidation && v)

		@@ -240,24 +240,45 @@ RCLValidations::onStale(RCLValidation&& v)
		return;


		/** Wrapper over STValidation for generic Validation code

		Wraps an STValidation::pointer for compatiblity with the generic validation


		};

		/** Implements the StalePolicy policy class for adapating Validations in the RCL


		beast::Journal j_;

		//! StalePolicy details providing now(),onStale() and flush callbacks

		@@ -199,11 +202,11 @@ handleNewValidation(Application& app,
		bool shouldRelay = false;

		// only add trusted or listed


		BEAST_EXPECT(AddOutcome::current == harness.add(a, Seq{1}, ID{1}));

		BEAST_EXPECT(harness.stale().empty());

Validations refactor (RIPD-1412) #2084

Validations refactor (RIPD-1412) #2084

Conversation

bachase commented Apr 11, 2017

bachase commented Apr 11, 2017

codecov-io commented Apr 11, 2017 • edited Loading

Codecov Report

scottschurr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bachase commented Apr 13, 2017 • edited Loading

wilsonianb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bachase commented Apr 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bachase commented Apr 18, 2017

wilsonianb commented Apr 18, 2017

scottschurr commented Apr 18, 2017

scottschurr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bachase commented Apr 24, 2017

scottschurr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Apr 11, 2017 •

edited

Loading

bachase commented Apr 13, 2017 •

edited

Loading