Support for attaching a zstd-dictionary to a folder/file #34

RubenKelevra · 2020-03-05T00:13:36Z

Compression is a long ongoing topic in several places and is probably somewhere on the timeline or at least on the horizon.

There are currently only two very interesting compression algorithms IMHO for ipfs to support: Brotli and zstd.

Brotli is supported by various browsers, so compressed files can be directly delivered from the storage, through ipfs, an HTTP gateway to the browser, without being uncompressed on the way.

The disadvantage of Brotli is the poor compression ratio on non-web stuff, like scientific data, binaries, source code files, etc.
zstd is a very advanced compression algorithm that offers a lot of speed on the decompression side, while extremely high compression ratios make it great for archiving purposes. There's also a large range of options on the compression complexity.

Zstd has no dictionary build in, which leads to poor compression ratios on many small files in the KB range as well as chunked data (which is basically the use-case IPFS would use it).

But zstd can create a static dictionary, by analyzing files which should be compressed, which can be stored next to the files.

Since this makes zstd much more useful for compressing the data chunks from ipfs, it would be quite interesting to embed the CID for the dictionary needed to read a given file or all files of a given directory in the metadata of the directory. This way IPFS can in the future let zstd analyze all chunks of all files of a folder, save the generated dictionary and compress all chunks with it and offer the compressed data transparently to the API.

Stebalien · 2020-03-05T01:08:53Z

You should take a look at ipld/specs#76 and read through the documents in https://github.com/ipld/specs/tree/master/design. This issue has been discussed extensively.

The current consensus, as far as I know, is to compress in the transport (on the wire) and on the disk as compressing blocks before addressing (hashing) them (a) changes the hash and (b) forces all peers to support the same compression algorithm.

Please read the linked issue (and the issues it links to, etc.) in the first paragraph before continuing this discussion. Otherwise, we'll just end rehashing them.

RubenKelevra mentioned this issue Mar 31, 2020

IPLD and compression ipld/specs#76

Closed

RubenKelevra closed this as completed Mar 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for attaching a zstd-dictionary to a folder/file #34

Support for attaching a zstd-dictionary to a folder/file #34

RubenKelevra commented Mar 5, 2020

Stebalien commented Mar 5, 2020

Support for attaching a zstd-dictionary to a folder/file #34

Support for attaching a zstd-dictionary to a folder/file #34

Comments

RubenKelevra commented Mar 5, 2020

Stebalien commented Mar 5, 2020