You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Zip files always retain an index located separately from each entry's possibly-compressed data. This allows performing high-level split/merge operations without de/recompressing file contents. This produces improved performance on benchmarks compared to serially iterating over each entry to extract, or serially iterating over each file to compress.
Describe the solution you'd like
It's possible to extract zip files in parallel (see #72) as well as merge them to create archives in parallel (see discussion in #73).
Describe alternatives you've considered
While parallel zip extraction as in #72 has likely been implemented elsewhere, to my knowledge the parallel split/merge technique in #73 (researched for pex-tool/pex#2175 and prototyped in https://github.com/cosmicexplorer/medusa-zip) has not been discussed or implemented before in other zip tooling (please let me know of any prior art for this!).
as in that pex change, bulk copy with renaming enables reconstituting a "parent" zip file from an ordered sequence of "child" zips, which may be used to very quickly reconstruct large zip files from immutable cached components.
when renaming is not required, ZipWriter::merge_contents() already works with a single io::copy() call. bulk copy with rename avoids de/recompression of file data, but must edit each renamed local file header and therefore requires O(n) io::copy() calls.
this zip crate should probably not get into the weeds of crawling the filesystem, which keeps medusa-zip useful as a separate crate, and ensures we don't add too much extraneous code to this one.
however, the process of merging an ordered sequence of "child" zips with ZipWriter::merge_contents()can be parallelized, and this is something the zip crate should be able to do.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Zip files always retain an index located separately from each entry's possibly-compressed data. This allows performing high-level split/merge operations without de/recompressing file contents. This produces improved performance on benchmarks compared to serially iterating over each entry to extract, or serially iterating over each file to compress.
Describe the solution you'd like
It's possible to extract zip files in parallel (see #72) as well as merge them to create archives in parallel (see discussion in #73).
Describe alternatives you've considered
While parallel zip extraction as in #72 has likely been implemented elsewhere, to my knowledge the parallel split/merge technique in #73 (researched for pex-tool/pex#2175 and prototyped in https://github.com/cosmicexplorer/medusa-zip) has not been discussed or implemented before in other zip tooling (please let me know of any prior art for this!).
Additional context
TODO:
Send
bounds)ZipWriter::merge_contents()
already works with a singleio::copy()
call. bulk copy with rename avoids de/recompression of file data, but must edit each renamed local file header and therefore requires O(n)io::copy()
calls.zip
crate should probably not get into the weeds of crawling the filesystem, which keepsmedusa-zip
useful as a separate crate, and ensures we don't add too much extraneous code to this one.ZipWriter::merge_contents()
can be parallelized, and this is something thezip
crate should be able to do.The text was updated successfully, but these errors were encountered: