remote upload: Skip gzip compression for files that are already compressed #161

tsibley · 2022-03-22T21:20:42Z

This avoids some useless work but mostly serves to head off confusion
(e.g. curl without the --compression option) and/or quirks of HTTP
clients (e.g. Snakemake's HTTP remote file provider¹) when a compressed
file is compressed again and served with a Content-Encoding: gzip
header.²

This doesn't come up with Nextstrain dataset and narrative files but
does with adjacent input files like metadata.tsv.gz and
sequences.fasta.xz which we also put in the S3 buckets (e.g.
s3://nextstrain-data/files/zika/…).³

The remote family of commands is not intended for generic S3 management
per se, but they're often useful in the Nextstrain ecosystem to manage
these ancillary data files. Part of this is that the commands are handy
and available, part of it is that Cloudfront invalidation still remains
a complication with using aws s3 directly. Avoiding double
compression doesn't go so far out of our way and helps support this
slightly off-label use case.

¹ snakemake/snakemake#1508
² https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1647910842228169
³ nextstrain/fauna#114

…essed This avoids some useless work but mostly serves to head off confusion (e.g. curl without the --compression option) and/or quirks of HTTP clients (e.g. Snakemake's HTTP remote file provider¹) when a compressed file is compressed again and served with a Content-Encoding: gzip header.² This doesn't come up with Nextstrain dataset and narrative files but does with adjacent input files like metadata.tsv.gz and sequences.fasta.xz which we also put in the S3 buckets (e.g. s3://nextstrain-data/files/zika/…).³ The remote family of commands is not intended for generic S3 management per se, but they're often useful in the Nextstrain ecosystem to manage these ancillary data files. Part of this is that the commands are handy and available, part of it is that Cloudfront invalidation still remains a complication with using `aws s3` directly. Avoiding double compression doesn't go so far out of our way and helps support this slightly off-label use case. ¹ snakemake/snakemake#1508 ² https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1647910842228169 ³ nextstrain/fauna#114

trvrb · 2022-03-22T22:08:38Z

Thanks for including these guardrails @tsibley. Running with my original commands prevents the confusing behavior. Much appreciated.

tsibley · 2022-03-22T22:54:59Z

Released with 3.2.1.

tsibley requested a review from a team March 22, 2022 21:20

trvrb approved these changes Mar 22, 2022

View reviewed changes

tsibley merged commit d04023d into master Mar 22, 2022

tsibley deleted the trs/skip-compression-if-already-compressed branch March 22, 2022 22:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remote upload: Skip gzip compression for files that are already compressed #161

remote upload: Skip gzip compression for files that are already compressed #161

tsibley commented Mar 22, 2022

trvrb commented Mar 22, 2022

tsibley commented Mar 22, 2022

remote upload: Skip gzip compression for files that are already compressed #161

remote upload: Skip gzip compression for files that are already compressed #161

Conversation

tsibley commented Mar 22, 2022

trvrb commented Mar 22, 2022

tsibley commented Mar 22, 2022