-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deterministic shards, ver 2.0 #3595
Conversation
|
||
For example, the following is an example of sorted hashes in their hex representation: | ||
``` | ||
0000000000000000000000000000000000000000000000000000000000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What value would be stored at that key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so called final key of the shard. Please refer to rippled code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarkusTeufelberger The value stored in the final key is the shard version, first ledger sequence, last ledger sequence and the hash of the ledger with the highest sequence (last ledger).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to sort the node hashes? When a deterministic shard is written, we iterate through the node objects in a determined manner. The hash order isn't important, the sequence in which they are written is. Maybe I missed something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that the prior sorting functionality was removed, excellent. Please update the MD file and code comments to reflect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please discuss with Miguel Portilla directly. I do not understand what do you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation for that "natural SHAmap tree traversal order" - which MD file are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if such a documentation exists. Last fix to Deterministic shards MD which mentions "natural order" was made to reflect the code changes suggested by this commit of Miguel Portilla. Prior to this change there was no need to mention "natural order". So I still recommend discuss this problem with him.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miguelportilla: can you please follow up with @MarkusTeufelberger. We should document the precise semantics so that the shard format can be parsed/generated by third-party implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added description of the order in which objects are visited.
|
||
Then, `digest(i)` is defined as the following portion of the above hash `H`: | ||
``` | ||
digest(0) = H[0] << 56 | H[2] << 48 | ... | H[14] << 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why all this interleaving again instead of just cutting the RIPEMD160 hash in 3 parts? This sounds needlessly complex, either you trust that this is a cryptographically secure hash (then you don't need this obfuscation) or you don't (then there are other ways to get a 32 and 2 64 bit numbers from 384 bits of input).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I documented here the piece of code proposed by Nik Bougalis. RIPEMD160 is also his proposition. Please refer to
#3363.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* 3. The flush() method stores all objects from memory pool to the shard | ||
* located in tempdir_ in sorted order. | ||
* 4. The close(true) method finalizes the shard and moves it from tempdir_ | ||
* temporary folder to filandir_ permanent folder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* temporary folder to filandir_ permanent folder, | |
* temporary folder to finaldir_ permanent folder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a pass, left some comments.
src/ripple/nodestore/impl/Shard.cpp
Outdated
@@ -392,7 +393,8 @@ Shard::isLegacy() const | |||
bool | |||
Shard::finalize( | |||
bool const writeSQLite, | |||
boost::optional<uint256> const& expectedHash) | |||
boost::optional<uint256> const& expectedHash, | |||
const bool writeDeterministicShard) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually don't put const
on value params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the const
is still there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/ripple/nodestore/impl/Shard.h
Outdated
*/ | ||
bool | ||
finalize( | ||
bool const writeSQLite, | ||
boost::optional<uint256> const& referenceHash); | ||
boost::optional<uint256> const& referenceHash, | ||
const bool writeDeterministicShard = true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually leave const
off of value params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the const
is still there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
static_assert( | ||
Generator::min() == 0, "If non-zero we have handle the offset"); | ||
const resultType range = b - a + 1; | ||
assert(Generator::max() >= range); // Just for safety |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider static_assert
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't. range is not constexpr.
@cdy20 I'd reword the commit to remove the |
This commit message was proposed by Nik Bougalis. |
|
||
For example, the following is an example of sorted hashes in their hex representation: | ||
``` | ||
0000000000000000000000000000000000000000000000000000000000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that the prior sorting functionality was removed, excellent. Please update the MD file and code comments to reflect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I left two minor nits, but they are minor enough I think it's OK to approve this now.
Also, please do change the commit message. The This commit, if merged,
is implied. We don't want every single commit message to start that way.
After deterministic shard is created using the above mentioned headers, | ||
it filled with objects using the following steps. | ||
|
||
1. All ledgers of the shard are visited in descendent order (from high |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
descendend
-> descending
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
2231553FC01D37A66C61BBEEACBB8C460994493E5659D118E19A8DDBB1444273 | ||
272DCBFD8E4D5D786CF11A5444B30FB35435933B5DE6C660AA46E68CF0F5C447 | ||
3C062FD9F0BCDCA31ACEBCD8E530D0BDAD1F1D1257B89C435616506A3EE6CB9E | ||
58A0E5AE427CDDC1C7C06448E8C3E4BF718DE036D827881624B20465C3E1334F |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that with thie example data we'd get the same sort order if we were sorting from the low order bits or the high order bits. It would be better to choose example data that sorted differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Done. |
@@ -517,6 +517,7 @@ target_sources (rippled PRIVATE | |||
src/ripple/nodestore/impl/DatabaseNodeImp.cpp | |||
src/ripple/nodestore/impl/DatabaseRotatingImp.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lovely
// Close SQL databases for original shard | ||
lgrSQLiteDB_.reset(); | ||
txSQLiteDB_.reset(); | ||
acquireInfo_.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to reset acquireInfo_
again, we just reset it at line 861.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/ripple/nodestore/impl/Shard.cpp
Outdated
pCache_->reset(); | ||
nCache_->reset(); | ||
app_.getShardFamily()->getFullBelowCache(lastSeq_)->reset(); | ||
app_.getShardFamily()->getTreeNodeCache(lastSeq_)->reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to reset these as they were just cleared in the preceding while loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/ripple/nodestore/impl/Shard.cpp
Outdated
return fail("failed to initialize SQLite databases"); | ||
|
||
setFileStats(lock); | ||
lastAccess_ = std::chrono::steady_clock::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs refactoring.
If detShard
is valid then it is not necessary to call initSQLite
, setFilStats
here, that will be done in the call to open
.
eg.
if (detShard)
{
detShard->store(nodeObject);
// Do not allow all other threads work with the shard
busy_ = true;
// Wait until all other threads leave the shard
while (backendCount_ > 1)
std::this_thread::yield();
// Obtain the mutex
std::lock_guard lock(mutex_);
// Close original shard database
backend_->close();
// Close SQL databases for original shard
lgrSQLiteDB_.reset();
txSQLiteDB_.reset();
// Remove the acquire SQLite database
if (acquireInfo_)
{
acquireInfo_.reset();
remove_all(dir_ / AcquireShardDBName);
}
// Close deterministic shard
detShard->close();
// Replace original shard by deterministic one.
detShard->replace();
// Re-open deterministic shard
if (!open(lock))
return false;
// Allow all other threads work with the shard
busy_ = false;
lastAccess_ = std::chrono::steady_clock::now();
}
else
{
std::lock_guard lock(mutex_);
// Remove the acquire SQLite database
if (acquireInfo_)
{
acquireInfo_.reset();
remove_all(dir_ / AcquireShardDBName);
}
if (!initSQLite(lock))
return fail("failed to initialize SQLite databases");
setFileStats(lock);
lastAccess_ = std::chrono::steady_clock::now();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/ripple/nodestore/impl/Shard.cpp
Outdated
*/ | ||
if (!finalizeInternal(writeSQLite, expectedHash, true)) | ||
return false; | ||
if (!finalizeInternal(false, expectedHash, false)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason we verify the deterministic shard again? I don't see this being necessary. Unless I am missing something, the third parameter can be removed and the deterministic shard should always be created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cdy20 Here is a branch with the changes I mentioned above. Feel free to cherry-pick or copy etc... https://github.com/miguelportilla/rippled/commits/d_shard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
For example, the following is an example of sorted hashes in their hex representation: | ||
``` | ||
0000000000000000000000000000000000000000000000000000000000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miguelportilla: can you please follow up with @MarkusTeufelberger. We should document the precise semantics so that the shard format can be parsed/generated by third-party implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to run this on my dogfood machine over the weekend and there are some issues I need to look into. In particular:
- Starting from an empty db and shards directory, I got a crash:
terminate called after throwing an instance of 'ripple::SHAMapMissingNode'
what(): Missing Node: State Tree: hash 2FDC38D4329B2CBF3B6D9899514E461AAF11A1DE6E21C94E3AE4F6B087679031
On a previous run, it was acquiring a shard (say XXX), and when I checked back later it was acquiring a different shard (say YYY), and the first shard (XXX) was no longer there. I saw the following log messages:
2020-Nov-21 01:28:20.846684223 UTC ShardStore:FTL shard 370. Exception caught in function initSQLite. Error: sqlite3_statement_backend::prepare: database is locked while preparing "PRAGMA journal_mode=wal;".
2020-Nov-21 04:42:25.112339696 UTC ShardStore:FTL shard 2586. can't create deterministic shard. Ledger hash 44B13514A162C4A91025434F7B7481897518AEF56F2ECB2A4855216BE833D9D6
2020-Nov-21 08:34:36.126196421 UTC ShardStore:FTL shard 1677. can't create deterministic shard. Ledger hash 9CABB88EE610B2CB32F32EF48F656F5E141DD51A168C7742EF184402FF4787D1
2020-Nov-21 11:00:51.510482384 UTC ShardStore:FTL shard 385. can't create deterministic shard. Ledger hash A057937D4B3CDE6AD3F47F1EB75DA40394438FC077A9ADAD71D69ACCC40B9DB3
Edit: Ignore the second error (acquiring shard XXX). I ran it on an old branch (5f3692399c
). However, the first issue was run on the correct branch (f4e6b58d7
). (They were run on different machines).
@cdy20 I just noticed that that machine was running on an old branch (5f36923). So let's ignore this issue (the FTL log messages, the However, the machine that threw was running on the latest |
@cdy20 Lol. It's just a machine separate from my regular development machine that I keep rippled running on. It comes from the idiom "eat your own dogfood".
The |
I launched rippled on mainnet 3 times from empty db on 2 different machines. Still can not reproduce this issue. 10 deterministic shards successfully collected in total. |
@cdy20 Thanks for trying to reproduce. I suspect this is going to be a hard bug to track down. Since this is a rare bug, we don't even know for sure this is caused by shards. Since we can't reproduce it, let's give some thought into any changes that could have caused this. In the meantime, let continue to run this on our dogfood machines. |
Depends on #3683 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 assuming Mickey's concerns are all addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Levelization check failed here. I don't know if this is right behaviour. |
Codecov Report
@@ Coverage Diff @@
## develop #3595 +/- ##
===============================
===============================
Continue to review full report at Codecov.
|
Add support to allow multiple indepedent nodes to produce a binary identical shard for a given range of ledgers. The advantage is that servers can use content-addressable storage, and can more efficiently retrieve shards by downloading from multiple peers at once and then verifying the integrity of a shard by cross-checking its checksum with the checksum other servers report.
This commit, if merged, adds support to allow multiple indepedent nodes to
produce a binary identical shard for a given range of ledgers. The advantage
is that servers can use content-addressable storage, and can more efficiently
retrieve shards by downloading from multiple peers at once and then verifying
the integrity of a shard by cross-checking its checksum with the checksum
other servers report.
Deterministic Database Shards
This doc describes the standard way to assemble the database shard.
A shard assembled using this approach becomes deterministic i.e.
if two independent sides assemble a shard consisting of the same ledgers,
accounts and transactions, then they will obtain the same shard files
nudb.dat
andnudb.key
. The approach deals with theNuDB
databaseformat only, refer to
https://github.com/vinniefalco/NuDB
.Headers
Due to NuDB database definition, the following headers are used for
database files:
nudb.key:
nudb.dat:
All of these fields are saved using network byte order
(bigendian: most significant byte first).
To make the shard deterministic the following parameters are used
as values of header field both for
nudb.key
andnudb.dat
files.Note: XXH64() is well-known hash algorithm.
The
digest(i)
mentioned above defined as the follows:First, RIPEMD160 hash
H
calculated of the following structure(the same as final Key of the shard):
there all 32-bit integers are hashed in network byte order
(bigendian: most significant byte first).
Then,
digest(i)
is defined as the following part of the above hashH
:where
H[i]
denotesi
-th byte of hashH
.Contents
After deterministic shard is created using the above mentioned headers,
it filled with objects using the following steps.
All objects within the shard are visited in the order described in the
next section. Here the objects are: ledger headers, SHAmap tree nodes
including state and transaction nodes, final key.
Set of all visited objects is divided into groups. Each group except of
the last contains 16384 objects in the order of their visiting. Last group
may contain less than 16384 objects.
All objects within each group are sorted in according to their hashes.
Objects are sorted by increasing of their hashes, precisely, by increasing
of hex representations of hashes in lexicographic order. For example,
the following is an example of sorted hashes in their hex representation:
sorted order within each group from low to high hashes.
Order of visiting objects
The shard consists of 16384 ledgers and the final key with the hash 0.
Each ledger has the header object and two SMAmaps: state and transaction.
SHAmap is a rooted tree in which each node has maximum of 16 descendants
enumerating by indexes 0..15. Visiting each node in the SHAmap
is performing by functions visitNodes and visitDifferences implemented
in the file
ripple/shamap/impl/ShaMapSync.cpp
.Here is how the function visitNodes works: it visit the root at first.
Then it visit all nodes in the 1st layer, i. e. the nodes which are
immediately descendants of the root sequentially from index 0 to 15.
Then it visit all nodes in 2nd layer i.e. the nodes which are immediately
descendants the nodes from 1st layer. The order of visiting 2nd layer nodes
is the following. First, descendants of the 1st layer node with index 0
are visited sequintially from index 0 to 15. Then descendents of 1st layer
node with index 1 are visited etc. After visiting all nodes of 2nd layer
the nodes from 3rd layer are visited etc.
The function visitDifferences works similar to visitNodes with the following
exceptions. The first exception is that visitDifferences get 2 arguments:
current SHAmap and previous SHAmap and visit only the nodes from current
SHAmap which and not present in previous SHAmap. The second exception is
that visitDifferences visits all non-leaf nodes in the order of visitNodes
function, but all leaf nodes are visited immedeately after visiting of their
parent node sequentially from index 0 to 15.
Finally, all objects within the shard are visited in the following order.
All ledgers are visited from the ledger with high index to the ledger with
low index in descending order. For each ledger the state SHAmap is visited
first using visitNode function for the ledger with highest index and
visitDifferences function for other ledgers. Then transaction SHAmap is
visited using visitNodes function. At last, the ledger header object is
visited. Final key of the shard is visited at the end.
Tests
To perform test to deterministic shards implementation one can enter
the following command:
The following is the right output of deterministic shards test: