-
Notifications
You must be signed in to change notification settings - Fork 962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmap for pruning historical data blobs #2033
Comments
related core/app issue: celestiaorg/celestia-core#994 |
discussion about what storage window should be: #2034 |
Any thoughts to consolidating the storage and syncing protocols between consensus nodes and bridge/full/light nodes? |
Deleting or (re-computing on the fly) EDS for historical blocks on non-pruned nodes (along with milestone 2)Because celestia-node stores the extended data, storing a 3MB original block (128x128 padded data square) would take up 29MB of storage, which is a ~10x blowup. There should be no reason for anyone to need to download the extended data for old blocks, after the storage window expires. The extended data should only be needed if there is a block withholding attack so the original data needs to be reconstructed, or for when light nodes sample blocks. To my knowledge, there should also be no reason for light nodes to sample expired blocks, but if we do want to support that use case, it's possible to efficiently recompute specific samples without recomputing the EDS using the |
The data storage model for data availability networks is that the data availability network is only expected to store data blobs for a certain period of time ("storage window"), and purge data blobs older than the storage window. The storage window should probably be greater or equal to the unbonding period, at minimum. However currently in celestia-node, all full nodes store all historical data, and all light nodes sample all historical block headers.
We propose a roadmap in which we can move to pruning historical data blobs.
Milestone 1: light node sampling only within the storage window by default.
By default, make it so that light nodes only sample (but still download) historical headers within the storage window. If they have been offline more than the unbonding period, they need a trusted hash anyway.
Milestone 2: implement full node pruning of blobs outside of the storage window.
Add the option for full nodes to prune historical blobs outside the storage window. Provide a way for nodes to discover non-pruned full nodes as well as pruned full nodes, so that nodes can still get historical blobs if it's available.
Milestone 3: implement namespaced partial storage full nodes.
Add the option for full nodes to follow and store the data for a set of specified namespaces, and allow themselves to advertise themselves for that namespace. This will allow rollup full nodes to be able to take responsibility for storage of historical data blobs.
Milestone 4: implement equally-sharded partial storage nodes within the storage window.
Like milestone 3, but instead of storing by namespace, store entire blocks based on what part of the chain they are. In this milestone, we are only focused on blobs within the storage window. This is so that we can increase the block size without increasing the requirements for people to run nodes that can serve light node sampling requests.
(stretch) Milestone 5: implement equally-sharded partial storage nodes outside of the storage window.
Like milestone 4, but for old blobs outside of the storage window. This will provide functionality that is outside the core guarantees of the data availability layer, by providing a means to download historical data blobs outside of the promised storage window. However, this may still be useful as a protocol for pay-to-play historical blob storage providers, e.g. you can only connect to such nodes if you pay them. This is also basically the same as non-pruned nodes in Milestone 2, but sharded.
The text was updated successfully, but these errors were encountered: