Add the text-file flag #152

mvdnes · 2020-06-20T09:53:47Z

This adds the text flag which is stored in the internal file attributes.

mvdnes · 2020-06-20T09:54:35Z

This is in response to the difference in the zipinfo output as seen in #151. This does not solve that bug however.

rylev

Two things I’d like to check before we merge:

see the utf8 comment
can we add a test?

rylev · 2020-06-20T12:03:15Z

src/write.rs

+
+        // Check if the buffer is text (UTF8) or contains binary data. Note that if an UTF8
+        // character is split between to calls to write, this will falsly mark it as binary.
+        // For ASCII characters, this is always correct.


Can we use Utf8Error::valid_up_to to figure out if we’re in this situation? I think to know for sure we would have to keep the bytes from any previous read the are potentially a non-complete character and if validation fails, prepend and check again.

We could also just mark it as text for ASCII files. The appnote is a bit vague here:

4.4.14.1 The lowest bit of this field indicates, if set, that the file is apparently an ASCII or text file. If not set, that the file apparently contains binary data.

rylev · 2020-06-20T12:05:20Z

src/types.rs

@@ -231,6 +231,8 @@ pub struct ZipFileData {
    pub header_start: u64,
    /// Specifies where the compressed data of the file starts
    pub data_start: u64,
+    /// Internal file attributes
+    pub internal_attributes: u16,


This is out of scope for this PR but I wonder if keeping this internally as a bitflags enum would be better....

src/write.rs

Plecra · 2020-06-20T12:16:13Z

I'd rather not provide this API in this way - the issue came up because there are cases where it is significant whether or not this bit is set. There are cases where it will also be important that the flag isn't set, and this only leaves the user the option to deliberately provide invalid UTF-8.

There is also the issue with the silent failure when someone is expecting to create a text file, but that seems less important

That said, this does get the job done, and implementing this the way I'd like to requires a major rewrite of ZipWriter, so I don't have a leg to stand on here

mvdnes · 2020-06-20T12:25:22Z

I am fine leaving this out for now. I do not know of any use cases which actually read this flag, except for zipinfo. This was a quick addition to test if this fixes the epub problem, but it did not seem to have any effect.

Plecra · 2020-06-20T12:32:41Z

Thanks for bringing it up! We'll have to figure out a good solution in the end, but I don't think this is it - there are too many downsides. I'll let @rylev make the call since I got to it late :P

rylev · 2020-06-22T15:12:37Z

@mvdnes @Plecra do we know what the use cases are where utf-8 (or even just ascii) data should be marked as a binary file?

Plecra · 2020-06-22T20:06:32Z

For now, we don't have any examples of code that would check the flag, no.

I'm fairly sure there'll be some zip-derived format that enforces the binary flag for, say, images, and images that technically contain valid unicode.

Also also, this will create false negatives too: "text" doesn't necessarily mean utf8.

rylev · 2020-06-23T12:40:46Z

What seems most reasonable to me in the long run: offer a way for the user to decide how files are classified with helpers for such things as declaring all valid utf8 as text. The default would be ASCII being text and everything else being marked binary.

Plecra · 2020-06-23T15:40:37Z

I'm moving this to issues as there haven't been any requests for the feature, and we're still not sure how best to implement it.

Add the text-file flag

4dbe01a

This adds the text flag which is stored in the internal file attributes.

rylev suggested changes Jun 20, 2020

View reviewed changes

Plecra suggested changes Jun 20, 2020

View reviewed changes

src/write.rs Show resolved Hide resolved

Plecra closed this Jun 23, 2020

Plecra mentioned this pull request Jun 23, 2020

Expose text file attribute #160

Closed

Plecra mentioned this pull request Feb 1, 2023

Text flag for zip files #263

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the text-file flag #152

Add the text-file flag #152

mvdnes commented Jun 20, 2020

mvdnes commented Jun 20, 2020

rylev left a comment

rylev Jun 20, 2020

mvdnes Jun 20, 2020

rylev Jun 20, 2020

Plecra commented Jun 20, 2020 •

edited

Loading

mvdnes commented Jun 20, 2020

Plecra commented Jun 20, 2020 •

edited

Loading

rylev commented Jun 22, 2020

Plecra commented Jun 22, 2020

rylev commented Jun 23, 2020

Plecra commented Jun 23, 2020

Add the text-file flag #152

Add the text-file flag #152

Conversation

mvdnes commented Jun 20, 2020

mvdnes commented Jun 20, 2020

rylev left a comment

Choose a reason for hiding this comment

rylev Jun 20, 2020

Choose a reason for hiding this comment

mvdnes Jun 20, 2020

Choose a reason for hiding this comment

rylev Jun 20, 2020

Choose a reason for hiding this comment

Plecra commented Jun 20, 2020 • edited Loading

mvdnes commented Jun 20, 2020

Plecra commented Jun 20, 2020 • edited Loading

rylev commented Jun 22, 2020

Plecra commented Jun 22, 2020

rylev commented Jun 23, 2020

Plecra commented Jun 23, 2020

Plecra commented Jun 20, 2020 •

edited

Loading

Plecra commented Jun 20, 2020 •

edited

Loading