-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deterministic shard databases/files #2688
Comments
Alright, I finally pulled through and finished my re-implementation of NuDB fetching and ledger node decoding in Python and could run some tests. An interesting observation so far: I write uncompressed nodes to the data store for now (it is really easy to write a dat file that way). Interestingly, zstd does not show these limitations (at least when using its own algorithm, it still creates too large lz4 files). |
Alright, I managed to create a tar.lz4 file with deterministic contents that gets accepted by rippled and added to the shard store! 🎉 Still to do: Create a tar archive deterministically (they contain all sorts of timestamps...) and then make sure that lz4 is also deterministic in operation... There is a bigger downside though - upon import of shard tar.lz4 files, rippled just seems to unpack them, it does NOT recompress the contents upon import (implementing compression schemes other than the InnerNodes one does not sound very stable to me...) or process them in any way. |
This commit, if merged, adds support to allow multiple indepedent nodes to produce a binary identical shard for a given range of ledgers. The advantage is that servers can use content-addressable storage, and can more efficiently retrieve shards by downloading from multiple peers at once and then verifying the integrity of a shard by cross-checking its checksum with the checksum other servers report.
This commit, if merged, adds support to allow multiple indepedent nodes to produce a binary identical shard for a given range of ledgers. The advantage is that servers can use content-addressable storage, and can more efficiently retrieve shards by downloading from multiple peers at once and then verifying the integrity of a shard by cross-checking its checksum with the checksum other servers report.
This commit, if merged, adds support to allow multiple indepedent nodes to produce a binary identical shard for a given range of ledgers. The advantage is that servers can use content-addressable storage, and can more efficiently retrieve shards by downloading from multiple peers at once and then verifying the integrity of a shard by cross-checking its checksum with the checksum other servers report.
This commit, if merged, adds support to allow multiple indepedent nodes to produce a binary identical shard for a given range of ledgers. The advantage is that servers can use content-addressable storage, and can more efficiently retrieve shards by downloading from multiple peers at once and then verifying the integrity of a shard by cross-checking its checksum with the checksum other servers report.
This commit, if merged, adds support to allow multiple indepedent nodes to produce a binary identical shard for a given range of ledgers. The advantage is that servers can use content-addressable storage, and can more efficiently retrieve shards by downloading from multiple peers at once and then verifying the integrity of a shard by cross-checking its checksum with the checksum other servers report.
This commit, if merged, adds support to allow multiple indepedent nodes to produce a binary identical shard for a given range of ledgers. The advantage is that servers can use content-addressable storage, and can more efficiently retrieve shards by downloading from multiple peers at once and then verifying the integrity of a shard by cross-checking its checksum with the checksum other servers report.
This commit, if merged, adds support to allow multiple indepedent nodes to produce a binary identical shard for a given range of ledgers. The advantage is that servers can use content-addressable storage, and can more efficiently retrieve shards by downloading from multiple peers at once and then verifying the integrity of a shard by cross-checking its checksum with the checksum other servers report.
This commit, if merged, adds support to allow multiple indepedent nodes to produce a binary identical shard for a given range of ledgers. The advantage is that servers can use content-addressable storage, and can more efficiently retrieve shards by downloading from multiple peers at once and then verifying the integrity of a shard by cross-checking its checksum with the checksum other servers report.
This commit, if merged, adds support to allow multiple indepedent nodes to produce a binary identical shard for a given range of ledgers. The advantage is that servers can use content-addressable storage, and can more efficiently retrieve shards by downloading from multiple peers at once and then verifying the integrity of a shard by cross-checking its checksum with the checksum other servers report.
This reverts commit 4dc08f8.
This reverts commit 4dc08f8.
Add support to allow multiple indepedent nodes to produce a binary identical shard for a given range of ledgers. The advantage is that servers can use content-addressable storage, and can more efficiently retrieve shards by downloading from multiple peers at once and then verifying the integrity of a shard by cross-checking its checksum with the checksum other servers report.
Add support to allow multiple indepedent nodes to produce a binary identical shard for a given range of ledgers. The advantage is that servers can use content-addressable storage, and can more efficiently retrieve shards by downloading from multiple peers at once and then verifying the integrity of a shard by cross-checking its checksum with the checksum other servers report.
As discussed in #2625 I'd like to propose a few changes to how shard databases are currently being built with the goal to have them created deterministically.
These changes would be:
Once such a file has been generated (ideally also with a canonical name...), it then is possible to compare hashes with others, immediately help seeding torrents or IPFS swarms or use this knowledge to implement a trustless way of quickly getting a lot of history without hitting your peer's node stores or shard databases. If you want to get very fancy, it would even be possible to add IPFS or BTinfo hashes to the ledger itself, if validators are willing (and able) to download and verify their contents (e.g. by extending rippled into a historic-headers-only mode and then asserting that a shard database contains all nodes necessary to walk all SHAmaps in the specified range with an unbroken header chain up to the latest one).
An alternative to forcing determinism upon NuDB would be to not use a database file format at all and just specify a deterministic export/import format for shards. An example would be a CSV file containing sorted key-value pairs in simple hex or base64 encoding. If these are independently verifiable, it might be already enough. As far as I understand the shard import code though, it kinda expects being able to randomly query the import file like a database.
What do you guys think, which approach is better - getting NuDB to do something it wasn't really designed for or to already overhaul/refactor something in rippled that was just added on the latest release? I started some experiments with the "deterministic NuDB" approach, but I'd be more motivated to work on something presentable once I know which approach is preferred by you guys.
Tagging/pinging @miguelportilla and @nbougalis for sharding and overall design/feature roadmap expertise.
The text was updated successfully, but these errors were encountered: