Skip to content

Commit

Permalink
Add support for deterministic database shards (XRPLF#2688):
Browse files Browse the repository at this point in the history
Add support to allow multiple indepedent nodes to produce a binary identical
shard for a given range of ledgers. The advantage is that servers can use
content-addressable storage, and can more efficiently retrieve shards by
downloading from multiple peers at once and then verifying the integrity of
a shard by cross-checking its checksum with the checksum other servers report.
  • Loading branch information
cdy20 authored and miguelportilla committed Dec 3, 2020
1 parent 7e8e116 commit 5183c5f
Show file tree
Hide file tree
Showing 9 changed files with 861 additions and 55 deletions.
1 change: 1 addition & 0 deletions Builds/CMake/RippledCore.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -517,6 +517,7 @@ target_sources (rippled PRIVATE
src/ripple/nodestore/impl/DatabaseNodeImp.cpp
src/ripple/nodestore/impl/DatabaseRotatingImp.cpp
src/ripple/nodestore/impl/DatabaseShardImp.cpp
src/ripple/nodestore/impl/DeterministicShard.cpp
src/ripple/nodestore/impl/DecodedBlob.cpp
src/ripple/nodestore/impl/DummyScheduler.cpp
src/ripple/nodestore/impl/EncodedBlob.cpp
Expand Down
15 changes: 15 additions & 0 deletions src/ripple/nodestore/Backend.h
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,21 @@ class Backend
virtual bool
isOpen() = 0;

/** Open the backend.
@param createIfMissing Create the database files if necessary.
@param appType Deterministic appType used to create a backend.
@param uid Deterministic uid used to create a backend.
@param salt Deterministic salt used to create a backend.
@throws std::runtime_error is function is called not for NuDB backend.
*/
virtual void
open(bool createIfMissing, uint64_t appType, uint64_t uid, uint64_t salt)
{
Throw<std::runtime_error>(
"Deterministic appType/uid/salt not supported by backend " +
getName());
}

/** Close the backend.
This allows the caller to catch exceptions.
*/
Expand Down
131 changes: 131 additions & 0 deletions src/ripple/nodestore/DeterministicShard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Deterministic Database Shards

This doc describes the standard way to assemble the database shard.
A shard assembled using this approach becomes deterministic i.e.
if two independent sides assemble a shard consisting of the same ledgers,
accounts and transactions, then they will obtain the same shard files
`nudb.dat` and `nudb.key`. The approach deals with the `NuDB` database
format only, refer to `https://github.com/vinniefalco/NuDB`.


## Headers

Due to NuDB database definition, the following headers are used for
database files:

nudb.key:
```
char[8] Type The characters "nudb.key"
uint16 Version Holds the version number
uint64 UID Unique ID generated on creation
uint64 Appnum Application defined constant
uint16 KeySize Key size in bytes
uint64 Salt A random seed
uint64 Pepper The salt hashed
uint16 BlockSize Size of a file block in bytes
uint16 LoadFactor Target fraction in 65536ths
uint8[56] Reserved Zeroes
uint8[] Reserved Zero-pad to block size
```

nudb.dat:
```
char[8] Type The characters "nudb.dat"
uint16 Version Holds the version number
uint64 UID Unique ID generated on creation
uint64 Appnum Application defined constant
uint16 KeySize Key size in bytes
uint8[64] (reserved) Zeroes
```
All of these fields are saved using network byte order
(bigendian: most significant byte first).

To make the shard deterministic the following parameters are used
as values of header field both for `nudb.key` and `nudb.dat` files.
```
Version 2
UID digest(0)
Appnum digest(2) | 0x5348524400000000 /* 'SHRD' */
KeySize 32
Salt digest(1)
Pepper XXH64(Salt)
BlockSize 0x1000 (4096 bytes)
LoadFactor 0.5 (numeric 0x8000)
```
Note: XXH64() is well-known hash algorithm.

The `digest(i)` mentioned above defined as the follows:

First, RIPEMD160 hash `H` calculated of the following structure
(the same as final Key of the shard):
```
uint32 version Version of shard, 2 at the present
uint32 firstSeq Sequence number of first ledger in the shard
uint32 lastSeq Sequence number of last ledger in the shard
uint256 lastHash Hash of last ledger in shard
```
there all 32-bit integers are hashed in network byte order
(bigendian: most significant byte first).

Then, `digest(i)` is defined as the following part of the above hash `H`:
```
digest(0) = H[0] << 56 | H[1] << 48 | ... | H[7] << 0,
digest(1) = H[8] << 56 | H[9] << 48 | ... | H[15] << 0,
digest(2) = H[16] << 24 | H[17] << 16 | ... | H[19] << 0,
```
where `H[i]` denotes `i`-th byte of hash `H`.


## Contents

After deterministic shard is created using the above mentioned headers,
it filled with objects using the following steps.

1. All ledgers of the shard are visited in descending order (from high
ledger sequences to low).

2. For each ledger all the objects in it are visited in natural SHAmap tree
traversal order. Here the objects are: ledgers, SHAmap tree nodes including
accounts and transactions, final key. Final key object visited after last
visited ledger.

3. Set of objects within the shard divide into groups. Each group except of
the last contains 16384 objects in the order of their visiting. Last group
may contain less than 16384 objects, and also contains the final key object.

4. All objects within each group are sorted in according to their hashes.
Objects are sorted by increasing of their hashes, precisely, by increasing
of hex representations of hashes in lexicographic order. For example,
the following is an example of sorted hashes in their hex representation:
```
0000000000000000000000000000000000000000000000000000000000000000
154F29A919B30F50443A241C466691B046677C923EE7905AB97A4DBE8A5C2429
2231553FC01D37A66C61BBEEACBB8C460994493E5659D118E19A8DDBB1444273
272DCBFD8E4D5D786CF11A5444B30FB35435933B5DE6C660AA46E68CF0F5C441
3C062FD9F0BCDCA31ACEBCD8E530D0BDAD1F1D1257B89C435616506A3EE6CB9E
58A0E5AE427CDDC1C7C06448E8C3E4BF718DE036D827881624B20465C3E1336F
...
```

5. Finally, objects added to the deterministic shard group by group in the
sorted order within each group from low to high hashes.


## Tests

To perform test to deterministic shards implementation one can enter
the following command:
```
rippled --unittest ripple.NodeStore.DatabaseShard
```

The following is the right output of deterministic shards test:
```
ripple.NodeStore.DatabaseShard DatabaseShard deterministic_shard
with backend nudb
Iteration 0: RIPEMD160[nudb.key] = F96BF2722AB2EE009FFAE4A36AAFC4F220E21951
Iteration 0: RIPEMD160[nudb.dat] = FAE6AE84C15968B0419FDFC014931EA12A396C71
Iteration 1: RIPEMD160[nudb.key] = F96BF2722AB2EE009FFAE4A36AAFC4F220E21951
Iteration 1: RIPEMD160[nudb.dat] = FAE6AE84C15968B0419FDFC014931EA12A396C71
```

32 changes: 27 additions & 5 deletions src/ripple/nodestore/backend/NuDBFactory.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,11 @@ namespace NodeStore {
class NuDBBackend : public Backend
{
public:
static constexpr std::size_t currentType = 1;
static constexpr std::uint64_t currentType = 1;
static constexpr std::uint64_t deterministicMask = 0xFFFFFFFF00000000ull;

/* "SHRD" in ASCII */
static constexpr std::uint64_t deterministicType = 0x5348524400000000ull;

beast::Journal const j_;
size_t const keyBytes_;
Expand Down Expand Up @@ -98,7 +102,8 @@ class NuDBBackend : public Backend
}

void
open(bool createIfMissing) override
open(bool createIfMissing, uint64_t appType, uint64_t uid, uint64_t salt)
override
{
using namespace boost::filesystem;
if (db_.is_open())
Expand All @@ -119,8 +124,9 @@ class NuDBBackend : public Backend
dp,
kp,
lp,
currentType,
nudb::make_salt(),
appType,
uid,
salt,
keyBytes_,
nudb::block_size(kp),
0.50,
Expand All @@ -133,7 +139,17 @@ class NuDBBackend : public Backend
db_.open(dp, kp, lp, ec);
if (ec)
Throw<nudb::system_error>(ec);
if (db_.appnum() != currentType)

/** Old value currentType is accepted for appnum in traditional
* databases, new value is used for deterministic shard databases.
* New 64-bit value is constructed from fixed and random parts.
* Fixed part is bounded by bitmask deterministicMask,
* and the value of fixed part is deterministicType.
* Random part depends on the contents of the shard and may be any.
* The contents of appnum field should match either old or new rule.
*/
if (db_.appnum() != currentType &&
(db_.appnum() & deterministicMask) != deterministicType)
Throw<std::runtime_error>("nodestore: unknown appnum");
db_.set_burst(burstSize_);
}
Expand All @@ -144,6 +160,12 @@ class NuDBBackend : public Backend
return db_.is_open();
}

void
open(bool createIfMissing) override
{
open(createIfMissing, currentType, nudb::make_uid(), nudb::make_salt());
}

void
close() override
{
Expand Down
Loading

0 comments on commit 5183c5f

Please sign in to comment.