-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#2688] Deterministic Database Shards #3363
Conversation
Depend on #3332 |
Please don't use MD5 for anything. Even if its use might be OK in this instance, just keeping/using it anywhere in the code base is the wrong move in my opinon. |
No problem if this is official position of Ripple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've marked a few nits, but there's a couple of big ticket items that I want to see addressed.
``` | ||
where `H[i]` denotes `i`-th byte of hash `H`. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First question, and this is more theoretical: should we modify NuDB
to increase the size of the headers of the key and data files so that they both are precisely 512 bytes, increase the size of some fields (specifically the UID field that should be 256 bits) and allocate a portion of the header as an area of the database owner to store his own metadata?
Now on to more practical matters: I really don't like the way that we're taking a cryptographic hash function, chopping its output up and manipulating it.
If we make the above changes to the NuDB header structure, we can keep using SHA256
and just set UID
to the hash, while storing some metadata in the reserved area.
If we can't do that, then rather than chopping amount the recombining the hash, I recommend the following: replace SHA256
with RIPEMD-160
and use it to calculate H
over the following data:
uint256 lastHash Hash of last ledger in shard
uint32 index Index of the shard
uint32 firstSeq Sequence number of first ledger in the shard
uint32 lastSeq Sequence number of last ledger in the shard
uint32 version Version of shard, 2 at the present
Crack the 160 bit hash H
apart into 2 64-bit values A
and B
and 1 32-bit value C
and set:
UID := A
;SALT := B
; andAPPNUM := C | 0x5348524400000000
. (0x53585244 == 'S' 'H' 'R' 'D'
).
This keeps the full 160-bit hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saving hash of shard as UID/salt was not required by this ticket. It is my proposition. Ticket recommends to set UID/salt to known constant + shard_index. Your proposition of using RIPEMD-160 instead of SHA256 is better. I think that changing NuDB standard is much more difficult because current standard is used is main net if I am not mistaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -114,7 +114,7 @@ class NuDBBackend | |||
{ | |||
create_directories(folder); | |||
nudb::create<nudb::xxhasher>(dp, kp, lp, | |||
currentType, nudb::make_salt(), keyBytes_, | |||
currentType, (salt ? *salt : nudb::make_salt()), keyBytes_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider salt.value_or(nudb::make_salt())
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/ripple/nodestore/Backend.h
Outdated
@param salt Deterministic salt used to create a backend. | ||
This allows the caller to catch exceptions. | ||
*/ | ||
virtual void open(bool createIfMissing, boost::optional<uint64_t> salt) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't ideal: it allows an implementation detail to "bleed" into the API, but I don't quite have a better idea on how to handle it.
At a minimum the default implementation of this function, here in Backend
should throw an std::runtime_error
exception, with an appropriate error message that explains that the specific backend does not support deterministic salts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (!keyfile || !datfile) | ||
return false; | ||
|
||
nudb::detail::key_file_header keyhead; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@p2peer, @miguelportilla we are reaching inside NuDB
internals here; we absolutely do not want to do this. We need to modify NuDB
so that it cleanly supports the operations that we need to perform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which way do you want to modify NuDB: 1) include NuDB to rippled and change the code; 2) connect to maintainers of NuDB and ask them to push a modification? I don't know if the way 2 will be effective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Author of NuDB said that he will research this, see his comment below. While he is researching, I put code under macro NUDB_SUPPORTS_CUSTOM_UID.
src/test/jtx/impl/Env.cpp
Outdated
if (resp["result"]["status"] != std::string("success")) | ||
{ | ||
std::cerr << "Env::close() failed: " << resp["result"]["status"] << std::endl; | ||
res = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't imagine any situation where ledger_accept
would not return success, but I know that you encountered at least one instance; I think it's worth understanding what happened there, because other that not running in standalone mode, there's no way the operation can fail and the unit tests essentially "force" standalone mode.
I would definitely not use std::cerr
to log from here; use test.log << "whatever"
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several failures of Env::close() appears on almost every local run or Travis run, you can see the Travis log for example: https://travis-ci.com/github/ripple/rippled/jobs/320158724), lines 713-721 and 773-775. I used std::cerr to see these messages on Travis log with --quiet option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing to std::cerr replaced by writing to journal.
src/ripple/nodestore/impl/Shard.cpp
Outdated
if (dsh) | ||
dsh->store(nObj); | ||
|
||
if (!valLedger(ledger, next, dsh)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're here, please rename this function verifyLedger
. Rationale: valLedger
is a poor name and the term validate
has a very different meaning in our code; we should avoid confusing. Similarly, please reame the message below: "verification check failed"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem if the author (@miguelportilla if I am not mistaking) agree with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verifyLedger
is okay with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
None of this is necessary, all you need to do is store a key/value pair with a known key constant (say, the number 1 expressed as a 256-bit little endian integer) and with the value containing all the data you want. This can all be accomplished without changing NuDB. |
These data are already saved as a finalKey under key 0. But the problem is that UID and salt should be deterministic. I can pass custom salt to the NuDB constructor, but I can't pass UID. That is the problem, so the modification of NuDB constructor is required. |
The UID is not user-modifiable. The purpose of this field is to trivially detect when a data file does not match the key file. |
Oh, I see... you want two shards generated by different machines to have the same NuDB file contents. In this case, yes I think we can add UID as a parameter to |
I guess to prevent these complexity attacks the hash of the last header in the shard is added to the salt, so there's no real way to mount such an attack until it is already too late to know how the hash will be modified for permanent storage. I am somewhat concerned about compression of values stored in there. Which compression scheme (if any?) must be used? |
There is no way to mount an attack, because each node builds the shard using a random salt. And when the shard is preprocessed for sharing, it is rewritten (to put the key/value pairs in order and thus, be deterministic). But since the shard at this point is read-only, no attack is possible. |
Wouldn't an algorithmic attack also affect a database file that is only being read from (and that only contains a single bucket that has tons of spill entries due to hash collisions in the extreme case)? Edit: Also the same contents (semantically) are not enough. The 2 files generated on 2 different machines need to be bitwise identical. You probably already meant this by "deterministic", just to be sure. |
An algorithmic attack on the database works by trying to get a client to build or accept a database whose keys collide. A client only accepts a database if the contents have been signed by validators. The deterministic salt cannot be predicted in advance, so the attacker has no idea what ledger entries she must create in order to cause collisions. And once the shard is built, an attacker cannot insert anything so no attack is possible. |
Yes, due to the header hash of the last header that gets mixed in, as I wrote already. I probably misunderstood your initial comment about the "just use a constant key of 1" somewhere. As far as i understand this here, so far it will create a deterministic header and a deterministic order of keys. I'm unsure if (e.g. after an update of As a data point, storing these shards with uncompressed values and then compressing the resulting files with |
Compressing the entire uncompressed data file instead of compressing each individual key/value pair makes a lot more sense (and is easily extensible too, if you put a header in front of the resulting compressed file). |
Changed to RIPEMD-160. |
For now, compression by default is used. I can change this to no compression, if this is better. |
Codecov Report
@@ Coverage Diff @@
## develop #3363 +/- ##
===========================================
+ Coverage 70.23% 71.97% +1.73%
===========================================
Files 683 684 +1
Lines 54630 54794 +164
===========================================
+ Hits 38370 39437 +1067
+ Misses 16260 15357 -903
Continue to review full report at Codecov.
|
src/ripple/nodestore/Backend.h
Outdated
boost::optional<uint64_t> salt) | ||
{ | ||
Throw<std::runtime_error>(std::string( | ||
"Deterministic appType/uid/salt not supported by bachend " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: bachend -> backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
static constexpr std::size_t currentType = 1; | ||
static constexpr std::uint64_t currentType = 1; | ||
static constexpr std::uint64_t deterministicType = 0x5348524400000000ll; | ||
static constexpr std::uint64_t deterministicMask = 0xFFFFFFFF00000000ll; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: A ull
suffix would be more consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left just a few nitpicky comments.
{ | ||
public: | ||
using nodeptr = std::shared_ptr<NodeObject>; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding:
DeterministicShard(DeterministicShard const&) = delete;
DeterministicShard& operator=(DeterministicShard const&) = delete;
This won't change anything. DeterministicShard
is neither copyable nor movable, constructible nor assignable, due to the combination of the user-declared destructor and the datamembers. But that fact is subtle. By explicitly deleting the copy members you make it more clear that this class has no copy nor move members.
Some people would go further and delete the move members too. My personal taste is that that is overly verbose because the user-declared copy members suppress the move members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
The `digest(i)` mentioned above defined as the follows: | ||
|
||
First, SHA256 hash `H` calculated of the following structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to be changed to the actual RIPEMD160 implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
backend_->open( | ||
false, | ||
digest(2) | 0x5348524400000000ll, /* appType */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a small explanation in the comment to this magic number? ("SHRD" in ASCII)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Needs rebasing to 1.6.0-b5. |
Having an option for storing data uncompressed would be really nice. Might be something for a future iteration though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Some of my comments are questions to clarify the design.
The DatabaseShardImp.cpp file has a few changes, totally about 6 lines that I am not familiar with Shard code enough to review. Please make sure other reviewers who know Shard well to review them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Please edit the document of the flush()
function in DeterministicShard.h
Done |
Depends on #3437 |
RIPEMD160_CTX c; | ||
uint8_t result[20]; | ||
|
||
RIPEMD160_Init(&c); | ||
RIPEMD160_Update(&c, &lastHash, sizeof(lastHash)); | ||
RIPEMD160_Update(&c, &index_, sizeof(index_)); | ||
auto firstSeq = db_.firstLedgerSeq(index_); | ||
RIPEMD160_Update(&c, &firstSeq, sizeof(firstSeq)); | ||
auto lastSeq = db_.lastLedgerSeq(index_); | ||
RIPEMD160_Update(&c, &lastSeq, sizeof(lastSeq)); | ||
RIPEMD160_Update(&c, &Shard::version, sizeof(Shard::version)); | ||
RIPEMD160_Final(result, &c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments:
- Lines like
RIPEMD160_Update(&c, &index_, sizeof(index_));
are problematic because they assume endianness. This won't bite us today, because we only support little-endian, but it may bite us tomorrow if we try to compile on a big-endian platform. - This should be using
ripple::ripemd160_hash
instead; doing so also fixes the endianess issue above.
ripemd160_hasher h;
hash_append(h, lastHash);
hash_append(h, index_);
hash_append(h, db_.firstLedgerSeq);
hash_append(h, db_.lastLedgerSeq);
hash_append(h, Shard::version);
// result is std::array<std::uint8_t, 20>
auto const result = static_cast<ripemd160_hasher::result_type>(h);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
The `digest(i)` mentioned above defined as the follows: | ||
|
||
First, RIPEMD160 hash `H` calculated of the following structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please comment that all 32-bit integers are hashed in network byte order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
std::uint64_t | ||
DeterministicShard::digest(int n) const | ||
{ | ||
int ind; | ||
std::uint64_t res = 0; | ||
for (int i = sizeof(std::uint64_t) - 1; i >= 0; --i) | ||
{ | ||
res <<= 8; | ||
ind = i + n * sizeof(std::uint64_t); | ||
if (ind < 160 / 8) | ||
res += hash_.data()[ind] & 0xff; | ||
} | ||
return res; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function works but seems a little overly complicated for what it's trying to do. You don't need to change it, but I'm wondering how you feel about this:
std::uint64_t
DeterministicShard::digest(int n) const
{
auto const data = hash_.data();
if (n == 2)
{ // Extract 32 bits:
return (static_cast<std::uint64_t>(data[19]) << 24) +
(static_cast<std::uint64_t>(data[18]) << 16) +
(static_cast<std::uint64_t>(data[17]) << 8) +
(static_cast<std::uint64_t>(data[16]));
}
std::uint64_t ret = 0;
if (n == 0 || n == 1)
{ // Extract 64 bits
for(int i = n; i < 16; i += 2)
ret = (ret << 8) + data[i];
}
return ret;
}
Note that digest(2)
now returns 32 bits (as a 64-bit value), while digest(0)
and digest(1)
return 64-bit values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
#include <test/jtx.h> | ||
#include <test/nodestore/TestBase.h> | ||
|
||
namespace ripple { | ||
namespace NodeStore { | ||
|
||
/* std::uniform_int_distribution is platform dependent */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh... I don't know that I care for all this. I understand the need, given the platform-specific implementation details of std::uniform_int_distribution
. I wonder if there's a better option here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for doing this? If it's to make sure we can repro unit test bugs on all platforms, then I'd just live with the platform dependence and remove this class. There is a very small possibility that we'll see failures on some platforms that we won't be able to repro on other platforms in the first place. If this happens, we'll just have to debug on the platforms where we can repro.
(if there's another reason for the class then nvm, I only took a very quick look before weighing in).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My unit test for deterministic shards is the following: it generates predictable accounts and transactions, packs them into ledgers and makes the shard. The hash of this shard should be equal to the given value. I saw that on different platforms (precisely, linux and mac) hashes of the resulting shard was different. I unvestigated this case and determine that the problem is in the class std::uniform_int_distribution which generates different pseudorandom sequences on different platforms, but we need predictable sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@p2peer That is a different motivation that what I was envisioning. Thanks for the clarification!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: added comment to the code
@@ -978,6 +1090,93 @@ class DatabaseShard_test : public TestBase | |||
} | |||
} | |||
|
|||
std::string | |||
ripemd160File(std::string filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as earlier: please use the hasher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Please consider updating the commit message (ref). I recommend:
Add support for deterministic database shards (#2688):
This commit, if merged, adds support to allow multiple indepedent nodes to
produce a binary identical shard for a given range of ledgers. The advantage
is that servers can use content-addressable storage, and can more efficiently
retrieve shards by downloading from multiple peers at once and then verifying
the integrity of a shard by cross-checking its checksum with the checksum
other servers report.
This commit, if merged, adds support to allow multiple indepedent nodes to produce a binary identical shard for a given range of ledgers. The advantage is that servers can use content-addressable storage, and can more efficiently retrieve shards by downloading from multiple peers at once and then verifying the integrity of a shard by cross-checking its checksum with the checksum other servers report.
Deterministic Database Shards
This doc describes the standard way to assemble the database shard. A shard assembled using this approach becomes deterministic i.e. if two independent sides assemble the shard consists of the same ledgers, accounts and transactions, then they will obtain the same shard files
nudb.dat
andnudb.key
. The approach deals with theNuDB
database format only, refer tohttps://github.com/vinniefalco/NuDB
.Headers
Due to NuDB database definition, the following headers are using for database files:
nudb.key:
nudb.dat:
there all fields are saved using network byte order (most significant byte first).
To make the shard deterministic the following parameters are used as values of header field both for
nudb.key
andnudb.dat
files.Note: xxhasher is native hasher of NuDB, refer to NuDB code.
The
digest(i)
mentioned above defined as the follows:First, SHA256 hash
H
calculated of the following structureThen,
digest(i)
is defined asi
-th 64-bit portion of the above hashH
with swapped order of bytes, i.e.where
H[i]
denotesi
-th byte of hashH
.Contents
After deterministic shard is created using the above mentioned headers, it filled with objects. First, all objects of the shard are collected and sorted in according to their hashes. Here the objects are: ledgers, SHAmap tree nodes including accounts and transactions, and final key object with hash 0. Objects are sorted by increasing of their hashes, precisely, by increasing of hex representations of hashes in lexicographic order.
For example, the following is an example of sorted hashes in their hex representation:
Finally, objects added to the shard one by one in the sorted order from low to high hashes.
Tests
To perform test to deterministic shards implementation one can enter the following command:
The following is the right output of deterministic shards test: