Deterministic shard databases/files #2688

MarkusTeufelberger · 2018-09-18T00:26:47Z

As discussed in #2625 I'd like to propose a few changes to how shard databases are currently being built with the goal to have them created deterministically.

These changes would be:

Only allow NuDB for shard databases (at least the deterministic kind). RocksDB is too much of a moving target.
Values that are inserted into the shard database must be deterministically compressed (this could include not compressing them at all...)
salt, app_num, key_size, block_size, load_factor and the hashing algorithm must be fixed and not derived from properties of the current system (sane defaults would probably be an easily recognizable number + the current shard number as salt, 1 for the app_num, 32 bytes for the key size [as is currently standard anyways], 4 KiB for the block_size, 0.5 for the load_factor and xxhash for the hashing algorithm)
Deterministic shard database files can only be written once the last (highest) ledger sequence of that shard has been closed and only if all nodes of the 16384 ledgers they contain are available
To write such a database file, all nodes contained in the shard must be written single-threaded in sorted order (ascending or descending, whatever is easier to implement) to a NuDB file. I haven't checked yet if there are other caches or optimizations going on, but hopefully that should already be enough to guarantee a deterministic result. If there's a lot of RAM available, it might be possible to load the whole shard into memory and sort it there, otherwise it could mean hitting the node store twice (once for getting all the keys and after sorting them for the values again).

Once such a file has been generated (ideally also with a canonical name...), it then is possible to compare hashes with others, immediately help seeding torrents or IPFS swarms or use this knowledge to implement a trustless way of quickly getting a lot of history without hitting your peer's node stores or shard databases. If you want to get very fancy, it would even be possible to add IPFS or BTinfo hashes to the ledger itself, if validators are willing (and able) to download and verify their contents (e.g. by extending rippled into a historic-headers-only mode and then asserting that a shard database contains all nodes necessary to walk all SHAmaps in the specified range with an unbroken header chain up to the latest one).

An alternative to forcing determinism upon NuDB would be to not use a database file format at all and just specify a deterministic export/import format for shards. An example would be a CSV file containing sorted key-value pairs in simple hex or base64 encoding. If these are independently verifiable, it might be already enough. As far as I understand the shard import code though, it kinda expects being able to randomly query the import file like a database.

What do you guys think, which approach is better - getting NuDB to do something it wasn't really designed for or to already overhaul/refactor something in rippled that was just added on the latest release? I started some experiments with the "deterministic NuDB" approach, but I'd be more motivated to work on something presentable once I know which approach is preferred by you guys.

Tagging/pinging @miguelportilla and @nbougalis for sharding and overall design/feature roadmap expertise.

MarkusTeufelberger · 2018-11-18T19:23:03Z

Alright, I finally pulled through and finished my re-implementation of NuDB fetching and ledger node decoding in Python and could run some tests.

An interesting observation so far:

I write uncompressed nodes to the data store for now (it is really easy to write a dat file that way).
The dat file for shard 2 (ledgers [32769:49152], 82757 objects in total) is 167056294 bytes long (~160 MiB)
Re-keyed with the nudb tool for now (haven't implemented that part yet) and turned into a tar.lz4 file via tar cvf - 2 | lz4 -9 - 2.tar.lz4 leads to a file that's about 78% the original size (Compressed 170055680 bytes into 132427450 bytes ==> 77.87%)
BUT:
Compared to the NuDB files from #2561 this is horribly inefficient: They contain exactly the same data, only in a slightly different storage format, leading to a 155628595 bytes dat file (~149 MiB), but these compress down to under 7MiB. This is also a much better approximation of the actually contained information in this file - these early ledgers are relatively small and there are less than hundred transactions in this shard.

Interestingly, zstd does not show these limitations (at least when using its own algorithm, it still creates too large lz4 files).

MarkusTeufelberger · 2018-11-21T02:44:27Z

Alright, I managed to create a tar.lz4 file with deterministic contents that gets accepted by rippled and added to the shard store!

🎉

Still to do: Create a tar archive deterministically (they contain all sorts of timestamps...) and then make sure that lz4 is also deterministic in operation...

There is a bigger downside though - upon import of shard tar.lz4 files, rippled just seems to unpack them, it does NOT recompress the contents upon import (implementing compression schemes other than the InnerNodes one does not sound very stable to me...) or process them in any way.

This commit, if merged, adds support to allow multiple indepedent nodes to produce a binary identical shard for a given range of ledgers. The advantage is that servers can use content-addressable storage, and can more efficiently retrieve shards by downloading from multiple peers at once and then verifying the integrity of a shard by cross-checking its checksum with the checksum other servers report.

This reverts commit 4dc08f8.

Add support to allow multiple indepedent nodes to produce a binary identical shard for a given range of ledgers. The advantage is that servers can use content-addressable storage, and can more efficiently retrieve shards by downloading from multiple peers at once and then verifying the integrity of a shard by cross-checking its checksum with the checksum other servers report.

MarkusTeufelberger mentioned this issue Oct 2, 2018

Add RPC command shard crawl (RIPD-1663) #2697

Closed

nbougalis added the Feature Request Used to indicate requests to add new features label Nov 1, 2018

nbougalis assigned miguelportilla Nov 19, 2018

manojsdoshi added a commit to manojsdoshi/rippled that referenced this issue Jul 22, 2020

Revert "Add support for deterministic database shards (XRPLF#2688):"

c086c0b

This reverts commit 4dc08f8.

manojsdoshi added a commit to manojsdoshi/rippled that referenced this issue Aug 5, 2020

Revert "Add support for deterministic database shards (XRPLF#2688):"

04304e1

This reverts commit 4dc08f8.

carlhua added the Reviewed label Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deterministic shard databases/files #2688

Deterministic shard databases/files #2688

MarkusTeufelberger commented Sep 18, 2018

MarkusTeufelberger commented Nov 18, 2018

MarkusTeufelberger commented Nov 21, 2018

Deterministic shard databases/files #2688

Deterministic shard databases/files #2688

Comments

MarkusTeufelberger commented Sep 18, 2018

MarkusTeufelberger commented Nov 18, 2018

MarkusTeufelberger commented Nov 21, 2018