-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[questions] Understanding history sharding #2625
Comments
Thanks for your questions, I've tried to answer them below. The shard code uses two constants that can be modified to suit different networks. The first is the number of 1- Shard indexes are calculated using the integral value of (ledger sequence - 1) / A shard's first ledger sequence can be calculated using the maximum integral value of 1 + (shard index * A shard's last ledger sequence can be calculated using the integral value of (shardIndex + 1) * So the XRP ledger network calls the contents of ledgers [32570-32768] (both inclusive) "shard 1", [32769-49152] "shard 2" and so on. 2- Correct 3- Correct 4- Correct 5- Correct 6- In the XPR ledger network, ledger 32570 is in shard 1. 7- Changing the constants may create incompatibility, they are not meant to be modified after use. 8- No, shard archives are only imported into the shards database. 9- Since we only import into the shards database, the order doesn't matter. If we did import into the node database, then the strategy you suggested might help if using contiguous shard indexes. I believe it is possible to create 'deterministically generated' shards as you described. That process can also be applied to convert existing shards. Implementing an IPFS client in rippled is on my list to do. |
Great to hear and thanks for your answers! :-) In that case I wonder why it was decided to go with the
Would a node that wants full history and has all/most shards locally on disk then fill its node_db from the shard_db as fast as it can read/write the data or would it still query the data from the network? I really hope the former is the case...
Write singlethreaded to a NuDB in deterministic order (e.g. sort all nodes by key alphabetically) and keep the salt of the database constant. I'm not so sure about the spill records, since the data gets written asynchronously, so if the disk can't keep up maybe it'll introduce problems? Should be testable though. A different option would be to have a dedicated import/export format (could be as simple as CSV, these are only key value pairs of hex-strings) for shards instead of sharing database files. This would also have the benefit of helping alternative use cases - NuDB is not exactly widely used and RocksDB database files are also probably not that easy to be used in an shard import context. |
Revisiting this because I'm slightly confused:
How can 49153 be the last ledger of shard 2 if the last one in a shard always will be a multiple of 16384? I think your example is off by one and the last ledger should be 49152 ( |
@MarkusTeufelberger Correct. I am not sure how that 3 snuck in there. ((2 + 1) * 16384 = 49152) |
Ah, I was already worried I misunderstood that part too. :-D For future me and/or anyone else reading this, here's the shard calculation stuff in Python3:
|
@MarkusTeufelberger can we close this issue? Also we welcome your feedback with #3455 |
I guess it can be closed, maybe @mDuo13 might want to look over this here and see if it is covered in the documentation by now. |
Are the following assumptions/statements around history sharding correct? (pinging @miguelportilla for help, who wrote most of the code)
earliest_seq = 32569
so it will also store 32570 in a shard, it will be likely incompatible with the rest of the network (since "shard 0" is different from every one else's "shard 0") and sharing them with others using Add shard download and import RPC #2561 will likely end in at least one ledger being requested from the network when importing the data.What I want to achieve is (once #2561 lands) to create a trustworthy (meaning: deterministically generated) way to share shards via IPFS and/or BitTorrent or other P2P filesharing networks, so it becomes easier to have a node with full history without using that much ressources on rippled servers with that data. I have some ideas how this might be doable with NuDB (just keep the salt static, write everything sorted and singlethreaded in a data file with no spill records, then generate the index which should put the spill records at the end of the data file(?)), but I'd like to be sure first that I understood history sharding before I start wrangling with NuDB again.
PS: It would be nice if the
earliest_seq
- hack could be removed, so shard IDs are globally mapping to the same ledger ranges and just disable shards 0 and 1 on XRPL... but that's probably worth a separate issue.The text was updated successfully, but these errors were encountered: