Replaced panics by errors on invalid pages #188

evanrichter · 2022-08-17T23:54:29Z

fuzzing was turning up errors again, so I am making best-guess fixes (one per commit)

let me know if any of my fixes are the wrong idea and I can fix it up!

codecov-commenter · 2022-08-17T23:57:59Z

Codecov Report

Merging #188 (e91f2b4) into main (a0e66c2) will decrease coverage by 0.09%.
The diff coverage is 42.85%.

@@            Coverage Diff             @@
##             main     #188      +/-   ##
==========================================
- Coverage   85.71%   85.62%   -0.10%     
==========================================
  Files          84       84              
  Lines        8227     8243      +16     
==========================================
+ Hits         7052     7058       +6     
- Misses       1175     1185      +10

Impacted Files	Coverage Δ
src/read/compression.rs	`92.40% <27.27%> (-3.57%)`	⬇️
src/read/page/reader.rs	`86.31% <57.14%> (-1.26%)`	⬇️
src/compression.rs	`92.08% <66.66%> (-0.36%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

evanrichter · 2022-08-18T05:11:18Z

fuzzing seems pretty stable for now, but I will let it run overnight

edit: no further crashes found :)

jorgecarleitao

Looks great so far! I have one comment regarding the removal of try_reserve that I think we need to do something about.

jorgecarleitao · 2022-08-18T05:23:22Z

src/read/page/reader.rs

@@ -200,13 +200,18 @@ pub(super) fn build_page<R: Read>(
    let read_size: usize = page_header.compressed_page_size.try_into()?;

    buffer.clear();
-    buffer.try_reserve(read_size)?;


I think we need to introduce a new parameter to the reader that controls the maximum allowed (compressed) size of the page. Maybe we could re-use the parameter of the header and generalize it for total page size (so, header + compressed). We then control for read_size < max_size?

imo this reserve is important as we do have important information that we would otherwise ignore.

Since it looks like buffer is a re-used Vec (is this right?), then the allocation cost will be amortized to zero cost after a couple resizes. If we want to avoid the initial resizes and avoid adding a parameter, maybe we could do something like:

const MAX_PREALLOC_SIZE: usize = 4 * 1024 * 1024; let prealloc_size = MAX_PREALLOC_SIZE.min(read_size); buffer.clear(); buffer.try_reserve(prealloc_size)?;

The check if bytes_read != read_size { ... is still necessary to catch out-of-spec page headers.

If a user has such a large file that is larger than their memory, it would be expected that performance would start to fall right?

Since it looks like buffer is a re-used Vec (is this right?), then the allocation cost will be amortized to zero cost after a couple resizes.

Yes, but there are very large pages out there (e.g. from 1Mb to 1G). So, not reserving can lead to multiple resizes of a potentially large number of bytes. In particular read_to_end reads in chunks of 32 bytes ^1, so this statement will now be quite expensive (even in the first time).

Another way of looking to it is: we are given a piece of information that 99.9999% or so of the times is correct (the compressed page length). Not using it to cater for the 0.0001% cases (malicious, etc.) is sub-optimal.

This is why I think we should consider a parameter, "max_page_size", and bail if we try to allocate more than that. I think this parameter should be represent max_uncompressed_size, and we use it both to limit the reading of the compressed page, and limit the number of bytes we decompress (to mitigate zip bombs). I think that this is the parameter that we are looking for to cater both cases (over allocation on read and zip bombs).

This also caters for the page header size, since that is decompressed by default.

this parameter should be represent max_uncompressed_size, and we use it both to limit the reading of the compressed page, and limit the number of bytes we decompress

yeah that sounds good! The limit is implemented as a field on PageReader in the latest push. It actually replaces max_header_size because if I understand correctly, any page size limit should easily keep a header size in bounds as well.

this doesn't protect against zip bombs yet. maybe adding the threshold as a field to CompressedDataPage and CompressedDictPage would be a good approach?

Exactly! :)

this doesn't protect against zip bombs yet. maybe adding the threshold as a field to CompressedDataPage and CompressedDictPage would be a good approach?

yes, my point is that, as a end user, you only need to think about one parameter instead of 2 (page_size,page_uncompressed) or 3 (page_header_size,page_size,_page_uncompressed), so what we are doing here will not negatively impact developer experience :)

Yes, we will need to introduce this parameter in the decompression path also ^^

make snappy header assertion an error

4b10dea

error on out-of-spec uncompressed size in v2 page header

9117a39

jorgecarleitao reviewed Aug 18, 2022

View reviewed changes

evanrichter added 2 commits August 19, 2022 13:41

WouldOverAllocate error when preallocation exceeds threshold

21c6c5f

check offset reported by file before indexing into page buffer

e801e1e

evanrichter force-pushed the more-fuzzing-fixes branch from e91f2b4 to e801e1e Compare August 19, 2022 18:42

evanrichter marked this pull request as ready for review August 19, 2022 19:00

jorgecarleitao changed the title ~~fixes for fuzzing bugs~~ Replaced panics by errors on invalid pages Aug 20, 2022

jorgecarleitao added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Aug 20, 2022

jorgecarleitao merged commit d142a95 into jorgecarleitao:main Aug 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replaced panics by errors on invalid pages #188

Replaced panics by errors on invalid pages #188

evanrichter commented Aug 17, 2022 •

edited

Loading

codecov-commenter commented Aug 17, 2022 •

edited

Loading

evanrichter commented Aug 18, 2022 •

edited

Loading

jorgecarleitao left a comment

jorgecarleitao Aug 18, 2022

evanrichter Aug 18, 2022

jorgecarleitao Aug 19, 2022

evanrichter Aug 19, 2022 •

edited

Loading

jorgecarleitao Aug 20, 2022

Replaced panics by errors on invalid pages #188

Replaced panics by errors on invalid pages #188

Conversation

evanrichter commented Aug 17, 2022 • edited Loading

codecov-commenter commented Aug 17, 2022 • edited Loading

Codecov Report

evanrichter commented Aug 18, 2022 • edited Loading

jorgecarleitao left a comment

Choose a reason for hiding this comment

jorgecarleitao Aug 18, 2022

Choose a reason for hiding this comment

evanrichter Aug 18, 2022

Choose a reason for hiding this comment

jorgecarleitao Aug 19, 2022

Choose a reason for hiding this comment

evanrichter Aug 19, 2022 • edited Loading

Choose a reason for hiding this comment

jorgecarleitao Aug 20, 2022

Choose a reason for hiding this comment

evanrichter commented Aug 17, 2022 •

edited

Loading

codecov-commenter commented Aug 17, 2022 •

edited

Loading

evanrichter commented Aug 18, 2022 •

edited

Loading

evanrichter Aug 19, 2022 •

edited

Loading