Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement history shards (RIPD-1289) #2258

Closed
wants to merge 4 commits into from

Conversation

miguelportilla
Copy link
Contributor

@miguelportilla miguelportilla commented Nov 4, 2017

Introduce ledger history shards.

Ripple sharding is a way of distributing historical data. Rippled can be easily configured to begin storing ledger history in shards. Shard data is automatically shared with peers via the ledger acquire process. It is worth noting that shards do not replace the node store. Although redundant, it is entirely possible to hold full history in the node store and the shard store. However, an effective configuration might limit the node store only to recent history. With the current implementation, each shard stores 2^14 ledgers, which may not be optimal. Determining a suitable value will involve carefully considering different aspects of the process. For one, the node store history size should at minimum be twice the ledgers per shard. Reason being that the current shard may be chosen to be stored and it would be wasteful to reacquire the data. Another consideration is the unit of work performed when the server decides to retain the current shard. The time to acquire, number of file handles, and memory cache usage is also affected.

Acquiring shards begins after synchronizing with the network and ledger backfilling. During this time of lower network activity, the shard database may be asked to select a shard to acquire. If a shard is selected, the ledger acquire process begins with the sequence of the last ledger in the shard and works backward to the first. Shards will continue to be acquired until the maximum allocated disk space for shards is reached at which point the current shard may replace an older shard.

Like the node store, the shard store derives from the base class Database. The common interface facilitates replacing the node store when storing or fetching from various places in the project. For instance, synchronization filters are used by the inbound ledger process to save data received from peers to a store. Specifying the database object for the filter allows us to choose where the data is stored. Similarly, SHAMaps are told through a Family, which database to use when fetching or storing. The derived stores themselves rely on the commonality when copying ledgers from one store to another or validating their state and transaction trees.

The following is a simplified high-level explanation of the process involved to store shards.

LedgerMaster asks DatabaseShard for a ledger to acquire. If a ledger is requested, InboundLedgers is asked to acquire it. If it is not stored locally, InboundLedgers asks the Overlay to request the ledger from peers. As data is received from peers, it is assembled and stored. Once all data pertaining to the ledger has been received, the DatabaseShard is notified.

The focus of this review should take place in the ‘Implement DatabaseShard’ commit. I apologize in advance for its size but slicing it smaller proved to be difficult as the dependencies are well interleaved. I will do my best to answer questions and provide clarity where needed.

@sublimator
Copy link
Contributor

Interesting!

Copy link
Collaborator

@bachase bachase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review progress, focusing on Shard and DatabaseShard. Generally looks good, but it would be nice to have unit tests covering those classes.

Looks like the AppVeyor build failed due to missing boost serialization lib.

CMakeLists.txt Outdated
else()
set(BOOST_ROOT ${Boost_INCLUDE_DIRS})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was moving this set intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find a reason for this change, will revert.

//------------------------------------------------------------------------------
/*
This file is part of rippled: https://github.com/ripple/rippled
Copyright (c) 2012, 2013 Ripple Labs Inc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: stale dates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Fix.

int cacheAge, beast::Journal& j)
: index_(index)
, firstSeq_(std::max(genesisSeq, detail::firstSeq(index)))
, lastSeq_(detail::lastSeq(index))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this std::max(genesisSeq, detail::lastSeq(index))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, should be std::max(firstSeq_, detail::lastSeq(index))

namespace NodeStore {

Shard::Shard(std::uint32_t index, int cacheSz,
int cacheAge, beast::Journal& j)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cacheAge be chrono::duration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TaggedCache ctor takes a rep and converts it to a duration. I'll change shard to the same type but the TaggedCache ctor is beyond the PR scope.

return false;
}

if (backend_->fdlimit() == 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does an fdlimit of 0 mean? Its not a failure in opening the shard?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the backends do not use files and therefore have no file handle requirements. eg. MemoryBackend, NullBackend. They can be used in a unit test.


private:
Application& app_;
mutable std::mutex m_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see all member functions taking this lock (init, prepare). Do only some members need protection?

Copy link
Contributor Author

@miguelportilla miguelportilla Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prepare takes the lock. There are a couple of private functions that assume the lock is held and that is noted in the comments. init doesn't as it should only be called once and before anything else. I will lock and throw if erroneously called more than once. Similar thing for validate. Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 One option that I learned from @scottschurr is to have those other private functions take the lock by reference to make it clearer for readers that the lock was already held. I trust the code more than the comments ;). Not necessary, just for your consideration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, will add.

return {};
}
ledger->setFull();
//ledger->setImmutable(app_.config());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this commented line as intended?

Copy link
Contributor Author

@miguelportilla miguelportilla Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary, removed.

return fetchInternal(hash, *backend);
}

// Lock must be held
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the lock acquired prior to calling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findShardIndexToAdd is called from prepare. The first line in prepare takes the lock.

std::uint32_t
seqToShardIndex(std::uint32_t const seq)
{
return (seq - 1) / ledgersPerShard;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a sequence number we should see, but if seq=0 this will not return the expected result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'll add an assert.

assert(numShards <= maxShardIndex + 1);

// If equal, have all the shards
if (numShards >= maxShardIndex + 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comparison seems inconsistent with the assert above.

Copy link
Contributor

@nbougalis nbougalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m still working through this. Left some minor comments, will focus on the logic next.

prevMissing(mCompleteLedgers, mPubLedger->info().seq);
ScopedLockType sl(mCompleteLock);
missing = prevMissing(mCompleteLedgers,
mPubLedger->info().seq, std::uint32_t(32600));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a symbolic constant, like earliestAvailableLedger or somesuch.

"Invalid [shard_db] configuration";
return false;
}
shardStore_->validate();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t this just return a true or false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure a boolean return would be very useful. One or more shards may fail to validate and that is logged but it is up to the user to take action. In the unlikely event that a shard is corrupt, they must delete the shard, same as the node store. If it is missing a node (should never happen on a complete shard), it will auto heal through SHAMap triggering the acquire of the missing nodes.

namespace ripple {
namespace NodeStore {

class DatabaseNode : public Database
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this class is unclear to me. Why do we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, will fix.

*/
virtual
bool
hasLedger(std::uint32_t seq) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No change needed, just food for thought. Should we prefer generic names like contains over something like hasLedger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely.

Copy link
Collaborator

@bachase bachase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished first pass on all the changes. Looks great overall.

@@ -180,7 +181,7 @@ class LedgerMaster
LedgerIndex ledgerIndex);

boost::optional <NetClock::time_point> getCloseTimeByHash (
LedgerHash const& ledgerHash);
LedgerHash const& ledgerHash, std::uint32_t index);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing std::uint32_t index to LedgerIndex index to match the rest of this file.

{
JLOG(m_journal.trace())
<< "fetchForHistory want fetch pack " << missing;
fetch_seq_ = missing;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to follow the larger flow here, but does fetch_seq_ = missing imply only one ledger is being fetched at a time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_seq simply tracks the ledger sequence for the last fetch pack requested. Its purpose is to prevent requesting the same fetch more than once. Generally, more than one ledger is fetched at the same time. For instance, if the ledger isn't present locally and there isn't a fetch pack available, we request the prior ten ledgers at once.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if someone calls

fetchForHistory(2);
fetchForHistory(1);
fetchForHistory(2);
fetchForHistory(1);

Does that means multiple fetches will happen for the same ledger?

Copy link
Contributor Author

@miguelportilla miguelportilla Nov 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am glad you asked as it made me realize fetchForHistory should be private! The code in the function was originally in doAdvance and I broke it out as it was too tall. DoAdvance is called from the jobQueue and it can and will call fetchForHistory with a prior sequence. That is by design and it won't hurt anything as inboundledgers::acquire will maintain only one fetch operation per ledger. So multiple fetches will not happen for the same ledger no matter how many times its called.

if (inbound && inbound->isComplete ())
return inbound->getLedger();
return {};
if (inbound->isFailed())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously there was a check that inbound was not null. No longer needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

mHash << " cannot be a ledger";
mFailed = true;
return true;
deserializeHeader(makeSlice(node->getData()), true),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch looks very similar to lines 293-307. Consider making a local lambda to combine them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@@ -87,6 +87,8 @@ class SHAMap
mutable SHAMapState state_;
SHAMapType type_;
bool backed_ = true; // Map is backed by the database
bool full_ = false; // Indicates all nodes should reside in a local store.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should I think of full_ in relation to backed_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If backed_ is true, then full_ determines whether or not every node pertaining to this SHAMap resides in a database.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording might be confusing. Maybe we can change "full_" to "Map is believed complete in database" or something.

@@ -150,6 +158,8 @@ class SHAMap
Marked `const` because the data is not part of
the map contents.
*/
void setFull ();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment above on L158-160 meant for some other member function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will fix, thanks!

Copy link
Contributor

@mellery451 mellery451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

boost::optional<Blob>
AccountStateSF::getNode(SHAMapHash const& nodeHash) const
AccountStateSF::getNode(SHAMapHash const& nodeHash,
std::uint32_t ledgerSeq) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we leave the second parameter unnamed here since it is not used ?

Blob&& nodeData, SHAMapTreeNode::TNType type) const
void
ConsensusTransSetSF::gotNode(bool fromFilter, SHAMapHash const& nodeHash,
std::uint32_t ledgerSeq, Blob&& nodeData, SHAMapTreeNode::TNType type) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here - both ledgerSeq are unused

if (mLedger)
tryDB(mLedger->stateMap().family());
else if(mReason == Reason::SHARD)
tryDB(*app_.shardFamily());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to check app_.shardFamily() is set/true or is that already guaranteed by callers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, guaranteed by the caller.

{
case Reason::SHARD:
app_.getShardStore()->setStored(mLedger);
// fall through
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider wording like this //TODO c++17: [[fallthrough]]

void
save(Archive& ar,
ripple::ClosedInterval<T> const& ci,
const unsigned int version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version is unused - is it just an optional way to version your object internals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

case ok:
++fetchHitCount_;
if (nObj)
fetchSz_ += nObj->getData().size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//TODO c++17: [[fallthrough]] ... or just add break

if (ledger->info().hash != hash || ledger->info().seq != seq)
{
JLOG(j_.error()) <<
"shard " << std::to_string(seqToShardIndex(seq)) <<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::uint32_t is streamable, so is there any benefit in using std::to_string for these ?

s += std::to_string(incomplete_->index());
else
s.pop_back();
JLOG(j_.fatal()) << s;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a FATAL log because it's an error, or just to ensure it shows up in the log?

DatabaseShardImp::asyncFetch(uint256 const& hash,
std::uint32_t seq, std::shared_ptr<NodeObject>& object)
{
TaggedCache<uint256, NodeObject>* pCache;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider making these 18 or so lines a selectCache function.

int
DatabaseShardImp::getDesiredAsyncReadCount(std::uint32_t seq)
{
auto const shardIndex {seqToShardIndex(seq)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that proposed selectCache function could be used here too

@ximinez
Copy link
Collaborator

ximinez commented Nov 19, 2017

The latest commit is causing all kinds of build errors.

@miguelportilla miguelportilla force-pushed the shard branch 5 times, most recently from 0b8ac7b to 48a64e3 Compare November 21, 2017 22:40
Copy link
Collaborator

@bachase bachase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on the latest commits.

auto it = complete_.find(shardIndex);
if (it != complete_.end())
{
cache->first = it->second->pCache();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to initialize the optional before assigning to the pair members, or just do a cache = std::make_pair. This crashed for me when running with assert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed.

@miguelportilla miguelportilla force-pushed the shard branch 3 times, most recently from 44b5d3b to 19c1971 Compare November 28, 2017 21:15
cache = std::make_pair(incomplete_->pCache(),
incomplete_->nCache());
}
return std::move(cache);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the std::move here.

In file included from /home/eah/dev/rippled-merge/src/ripple/unity/nodestore.cpp:32:
/home/eah/dev/rippled-merge/src/ripple/nodestore/impl/DatabaseShardImp.cpp:740:12: warning: 
      moving a local object in a return statement prevents copy elision
      [-Wpessimizing-move]
    return std::move(cache);
           ^
/home/eah/dev/rippled-merge/src/ripple/nodestore/impl/DatabaseShardImp.cpp:740:12: note: 
      remove std::move call here
    return std::move(cache);
           ^~~~~~~~~~     ~
1 warning generated.

@codecov-io
Copy link

codecov-io commented Dec 1, 2017

Codecov Report

Merging #2258 into develop will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #2258   +/-   ##
========================================
  Coverage    69.64%   69.64%           
========================================
  Files          705      705           
  Lines        58426    58426           
========================================
  Hits         40691    40691           
  Misses       17735    17735

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dc9e9f4...a75d4ea. Read the comment docs.

Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not done a full review, but the issues I had with building and the move warning have been resolved. 👍

Copy link
Collaborator

@JoelKatz JoelKatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I believe these changes are safe and should be merged to allow work on shards to continue and to get the network supporting the protocol changes for shards. But I do believe we should also hold off advising people to enable sharding until we get some of the remaining missing features taken care off -- particularly ensuring that enabling sharding doesn't significantly increase resource consumption.

@sublimator
Copy link
Contributor

sublimator commented Dec 15, 2017 via email

@miguelportilla miguelportilla added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Dec 19, 2017
@ripplelabs-jenkins
Copy link
Collaborator

ripplelabs-jenkins commented Dec 19, 2017

Jenkins Build Summary

Built from this commit

Built at 20171219 - 22:52:17

Test Results

Build Type Result Status
clang.debug.unity 970 cases, 0 failed, t: 385s PASS ✅
coverage 970 cases, 0 failed, t: 612s PASS ✅
clang.debug.nounity 968 cases, 0 failed, t: 344s PASS ✅
gcc.debug.unity 970 cases, 0 failed, t: 432s PASS ✅
gcc.debug.nounity 968 cases, 0 failed, t: 357s PASS ✅
clang.release.unity 969 cases, 0 failed, t: 467s PASS ✅
gcc.release.unity 969 cases, 0 failed, t: 501s PASS ✅

@miguelportilla miguelportilla force-pushed the shard branch 3 times, most recently from f46b7d3 to 35decbf Compare December 19, 2017 21:04
@scottschurr
Copy link
Collaborator

Incorporated into 0.90.0-b4 as commits 819ea46, aeda243, and 718d217.

@miguelportilla miguelportilla deleted the shard branch February 23, 2018 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants