Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[questions] Understanding history sharding #2625

Closed
MarkusTeufelberger opened this issue Jul 12, 2018 · 7 comments
Closed

[questions] Understanding history sharding #2625

MarkusTeufelberger opened this issue Jul 12, 2018 · 7 comments
Assignees
Labels
Documentation README changes, code comments, etc. Reviewed

Comments

@MarkusTeufelberger
Copy link
Collaborator

Are the following assumptions/statements around history sharding correct? (pinging @miguelportilla for help, who wrote most of the code)

  1. Rippled (by default) will call the contents of ledgers [32571-48954] (both inclusive) "shard 0", [48955-65338] "shard 1" and so on.
  2. Every shard contains ALL nodes that appear in the 16384 ledgers it contains, even if they haven't changed during these 16384 ledgers (e.g. an AccountRoot node of a quite passive account).
  3. "All nodes" means they also contain InnerNodes.
  4. If 2 is true, that means that the total space used up by all shards is probably a bit larger than the actual node_db of a server with full history (since nodes are stored only once in there).
  5. Shard database files are not created deterministically, they can look potentially different on every single server out there, even if they contain the same data. This is even true if they were all generated with NuDB xor RocksDB.
  6. Ledger 0 (or ledger 32570 in the default config) will not make it into a shard.
  7. If I set my node to earliest_seq = 32569 so it will also store 32570 in a shard, it will be likely incompatible with the rest of the network (since "shard 0" is different from every one else's "shard 0") and sharing them with others using Add shard download and import RPC #2561 will likely end in at least one ledger being requested from the network when importing the data.
  8. After importing the data with Add shard download and import RPC #2561 on a server that is configured for full history, but that has no data yet, the ledger contents will be written into both node_db and shard_db.
  9. It would be a smart strategy in the situation of 8 to start importing the latest shards (= highest IDs) first and then going back down to shard 0 instead of starting at 0 and moving forward in time.

What I want to achieve is (once #2561 lands) to create a trustworthy (meaning: deterministically generated) way to share shards via IPFS and/or BitTorrent or other P2P filesharing networks, so it becomes easier to have a node with full history without using that much ressources on rippled servers with that data. I have some ideas how this might be doable with NuDB (just keep the salt static, write everything sorted and singlethreaded in a data file with no spill records, then generate the index which should put the spill records at the end of the data file(?)), but I'd like to be sure first that I understood history sharding before I start wrangling with NuDB again.

PS: It would be nice if the earliest_seq - hack could be removed, so shard IDs are globally mapping to the same ledger ranges and just disable shards 0 and 1 on XRPL... but that's probably worth a separate issue.

@miguelportilla
Copy link
Contributor

miguelportilla commented Aug 17, 2018

Hi @MarkusTeufelberger,

Thanks for your questions, I've tried to answer them below.

The shard code uses two constants that can be modified to suit different networks. The first is the number of ledgers per shard. For the XPR ledger network, that value is 16384 which is based on a compromise of resource usage and application performance. The second constant is the earliest ledger sequence allowed. For the XRP ledger network, that value is 32570.

1- Shard indexes are calculated using the integral value of (ledger sequence - 1) / ledgers per shard.
Given ledger sequence 32570, we have (32570-1)/16384 = 1

A shard's first ledger sequence can be calculated using the maximum integral value of 1 + (shard index * ledgers per shard) or the earliest ledger sequence.
Given shard index 1, we have max(1 + (1 * 16384), 32570) = 32570

A shard's last ledger sequence can be calculated using the integral value of (shardIndex + 1) * ledgers per shard
Given shard index 1, we have (1 + 1) * 16384 = 32768

So the XRP ledger network calls the contents of ledgers [32570-32768] (both inclusive) "shard 1", [32769-49152] "shard 2" and so on.

2- Correct

3- Correct

4- Correct

5- Correct

6- In the XPR ledger network, ledger 32570 is in shard 1.

7- Changing the constants may create incompatibility, they are not meant to be modified after use.

8- No, shard archives are only imported into the shards database.

9- Since we only import into the shards database, the order doesn't matter. If we did import into the node database, then the strategy you suggested might help if using contiguous shard indexes.

I believe it is possible to create 'deterministically generated' shards as you described. That process can also be applied to convert existing shards.

Implementing an IPFS client in rippled is on my list to do.

@MarkusTeufelberger
Copy link
Collaborator Author

Great to hear and thanks for your answers! :-)

In that case I wonder why it was decided to go with the earliest ledger sequence design instead of just skipping shards 1+2 by default? There is a nonzero chance that earlier ledgers could be recovered and then all shard databases in the whole network might potentially need to be migrated or change because the earliest ledger sequence changes... Compared to just not having a few hundred of the earliest ledgers in a shard and making it much easier to calculate which shard a certain ledger height is going to be in in the process.

Since we only import into the shards database, the order doesn't not matter.

Would a node that wants full history and has all/most shards locally on disk then fill its node_db from the shard_db as fast as it can read/write the data or would it still query the data from the network? I really hope the former is the case...

I believe it is possible to create 'deterministically generated' shards as you described.

Write singlethreaded to a NuDB in deterministic order (e.g. sort all nodes by key alphabetically) and keep the salt of the database constant. I'm not so sure about the spill records, since the data gets written asynchronously, so if the disk can't keep up maybe it'll introduce problems? Should be testable though.

A different option would be to have a dedicated import/export format (could be as simple as CSV, these are only key value pairs of hex-strings) for shards instead of sharing database files. This would also have the benefit of helping alternative use cases - NuDB is not exactly widely used and RocksDB database files are also probably not that easy to be used in an shard import context.
The downside would be that there's the same data stored up to three times (node_db, shard_db, export/import file).

@MarkusTeufelberger
Copy link
Collaborator Author

Revisiting this because I'm slightly confused:

So the XRP ledger network calls the contents of ledgers [32570-32768] (both inclusive) "shard 1", [32769-49153] "shard 2" and so on.

How can 49153 be the last ledger of shard 2 if the last one in a shard always will be a multiple of 16384?

I think your example is off by one and the last ledger should be 49152 ((shardIndex + 1) * ledgers per shard --> (2 + 1) * 16384 = 49152), right?

@miguelportilla
Copy link
Contributor

@MarkusTeufelberger Correct. I am not sure how that 3 snuck in there. ((2 + 1) * 16384 = 49152)

@MarkusTeufelberger
Copy link
Collaborator Author

Ah, I was already worried I misunderstood that part too. :-D

For future me and/or anyone else reading this, here's the shard calculation stuff in Python3:

# constants
LEDGERS_PER_SHARD = 16384
EARLIEST_PUBLIC_LEDGER = 32570

def earliest_ledger_in_shard(shard_num: int) -> int:
    return max(EARLIEST_PUBLIC_LEDGER, 1 + (shard_num * LEDGERS_PER_SHARD))

def last_ledger_in_shard(shard_num: int) -> int:
    return max(EARLIEST_PUBLIC_LEDGER, (shard_num + 1) * LEDGERS_PER_SHARD)

def shard_index_by_ledger(ledger_height: int) -> int:
    return (ledger_height - 1) // LEDGERS_PER_SHARD

@mDuo13 mDuo13 added the Documentation README changes, code comments, etc. label Aug 2, 2019
@carlhua
Copy link
Contributor

carlhua commented Sep 16, 2020

@MarkusTeufelberger can we close this issue? Also we welcome your feedback with #3455

@MarkusTeufelberger
Copy link
Collaborator Author

I guess it can be closed, maybe @mDuo13 might want to look over this here and see if it is covered in the documentation by now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation README changes, code comments, etc. Reviewed
Projects
None yet
Development

No branches or pull requests

5 participants