You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 6, 2022. It is now read-only.
Compression is a long ongoing topic in several places and is probably somewhere on the timeline or at least on the horizon.
There are currently only two very interesting compression algorithms IMHO for ipfs to support: Brotli and zstd.
Brotli is supported by various browsers, so compressed files can be directly delivered from the storage, through ipfs, an HTTP gateway to the browser, without being uncompressed on the way.
The disadvantage of Brotli is the poor compression ratio on non-web stuff, like scientific data, binaries, source code files, etc.
zstd is a very advanced compression algorithm that offers a lot of speed on the decompression side, while extremely high compression ratios make it great for archiving purposes. There's also a large range of options on the compression complexity.
Zstd has no dictionary build in, which leads to poor compression ratios on many small files in the KB range as well as chunked data (which is basically the use-case IPFS would use it).
But zstd can create a static dictionary, by analyzing files which should be compressed, which can be stored next to the files.
Since this makes zstd much more useful for compressing the data chunks from ipfs, it would be quite interesting to embed the CID for the dictionary needed to read a given file or all files of a given directory in the metadata of the directory. This way IPFS can in the future let zstd analyze all chunks of all files of a folder, save the generated dictionary and compress all chunks with it and offer the compressed data transparently to the API.
The text was updated successfully, but these errors were encountered:
The current consensus, as far as I know, is to compress in the transport (on the wire) and on the disk as compressing blocks before addressing (hashing) them (a) changes the hash and (b) forces all peers to support the same compression algorithm.
Please read the linked issue (and the issues it links to, etc.) in the first paragraph before continuing this discussion. Otherwise, we'll just end rehashing them.
Compression is a long ongoing topic in several places and is probably somewhere on the timeline or at least on the horizon.
There are currently only two very interesting compression algorithms IMHO for ipfs to support: Brotli and zstd.
Brotli is supported by various browsers, so compressed files can be directly delivered from the storage, through ipfs, an HTTP gateway to the browser, without being uncompressed on the way.
The disadvantage of Brotli is the poor compression ratio on non-web stuff, like scientific data, binaries, source code files, etc.
zstd is a very advanced compression algorithm that offers a lot of speed on the decompression side, while extremely high compression ratios make it great for archiving purposes. There's also a large range of options on the compression complexity.
Zstd has no dictionary build in, which leads to poor compression ratios on many small files in the KB range as well as chunked data (which is basically the use-case IPFS would use it).
But zstd can create a static dictionary, by analyzing files which should be compressed, which can be stored next to the files.
Since this makes zstd much more useful for compressing the data chunks from ipfs, it would be quite interesting to embed the CID for the dictionary needed to read a given file or all files of a given directory in the metadata of the directory. This way IPFS can in the future let zstd analyze all chunks of all files of a folder, save the generated dictionary and compress all chunks with it and offer the compressed data transparently to the API.
The text was updated successfully, but these errors were encountered: