Roadmap for pruning historical data blobs #2033

musalbas · 2023-04-06T14:03:21Z

The data storage model for data availability networks is that the data availability network is only expected to store data blobs for a certain period of time ("storage window"), and purge data blobs older than the storage window. The storage window should probably be greater or equal to the unbonding period, at minimum. However currently in celestia-node, all full nodes store all historical data, and all light nodes sample all historical block headers.

We propose a roadmap in which we can move to pruning historical data blobs.

Milestone 1: light node sampling only within the storage window by default.
By default, make it so that light nodes only sample (but still download) historical headers within the storage window. If they have been offline more than the unbonding period, they need a trusted hash anyway.

Milestone 2: implement full node pruning of blobs outside of the storage window.
Add the option for full nodes to prune historical blobs outside the storage window. Provide a way for nodes to discover non-pruned full nodes as well as pruned full nodes, so that nodes can still get historical blobs if it's available.

Milestone 3: implement namespaced partial storage full nodes.
Add the option for full nodes to follow and store the data for a set of specified namespaces, and allow themselves to advertise themselves for that namespace. This will allow rollup full nodes to be able to take responsibility for storage of historical data blobs.

Milestone 4: implement equally-sharded partial storage nodes within the storage window.
Like milestone 3, but instead of storing by namespace, store entire blocks based on what part of the chain they are. In this milestone, we are only focused on blobs within the storage window. This is so that we can increase the block size without increasing the requirements for people to run nodes that can serve light node sampling requests.

(stretch) Milestone 5: implement equally-sharded partial storage nodes outside of the storage window.
Like milestone 4, but for old blobs outside of the storage window. This will provide functionality that is outside the core guarantees of the data availability layer, by providing a means to download historical data blobs outside of the promised storage window. However, this may still be useful as a protocol for pay-to-play historical blob storage providers, e.g. you can only connect to such nodes if you pay them. This is also basically the same as non-pruned nodes in Milestone 2, but sharded.

musalbas · 2023-04-06T14:03:52Z

related core/app issue: celestiaorg/celestia-core#994

musalbas · 2023-04-06T15:08:47Z

discussion about what storage window should be: #2034

cmwaters · 2023-04-06T15:13:39Z

Any thoughts to consolidating the storage and syncing protocols between consensus nodes and bridge/full/light nodes?

musalbas · 2023-04-06T16:11:14Z

Deleting or (re-computing on the fly) EDS for historical blocks on non-pruned nodes (along with milestone 2)

Because celestia-node stores the extended data, storing a 3MB original block (128x128 padded data square) would take up 29MB of storage, which is a ~10x blowup.

There should be no reason for anyone to need to download the extended data for old blocks, after the storage window expires. The extended data should only be needed if there is a block withholding attack so the original data needs to be reconstructed, or for when light nodes sample blocks. To my knowledge, there should also be no reason for light nodes to sample expired blocks, but if we do want to support that use case, it's possible to efficiently recompute specific samples without recomputing the EDS using the ReconstructSome method in the reedsolomon library (this optimization can also be applied for blocks within the storage window too!).

github-actions bot added the needs:triage label Apr 6, 2023

musalbas mentioned this issue Apr 6, 2023

Implement syncing from genesis without downloading data blobs celestiaorg/celestia-core#994

Open

renaynay added area:shares Shares and samples area:storage and removed needs:triage labels Apr 6, 2023

Wondertan mentioned this issue Jun 12, 2023

[optimization] Strategy for pruning Headers and Shares #272

Open

musalbas mentioned this issue Aug 26, 2023

Pruned full/bridge nodes: minimum viable pruning #2615

Open

distractedm1nd mentioned this issue Sep 22, 2023

[EPIC] Storage Pruning #2748

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap for pruning historical data blobs #2033

Roadmap for pruning historical data blobs #2033

musalbas commented Apr 6, 2023 •

edited

Loading

musalbas commented Apr 6, 2023

musalbas commented Apr 6, 2023

cmwaters commented Apr 6, 2023 •

edited

Loading

musalbas commented Apr 6, 2023 •

edited

Loading

Roadmap for pruning historical data blobs #2033

Roadmap for pruning historical data blobs #2033

Comments

musalbas commented Apr 6, 2023 • edited Loading

musalbas commented Apr 6, 2023

musalbas commented Apr 6, 2023

cmwaters commented Apr 6, 2023 • edited Loading

musalbas commented Apr 6, 2023 • edited Loading

Deleting or (re-computing on the fly) EDS for historical blocks on non-pruned nodes (along with milestone 2)

musalbas commented Apr 6, 2023 •

edited

Loading

cmwaters commented Apr 6, 2023 •

edited

Loading

musalbas commented Apr 6, 2023 •

edited

Loading