-
Notifications
You must be signed in to change notification settings - Fork 224
Column with empty dictionary but contains values causes panic #976
Comments
Thanks for the report! The invariant is that Do you have the parquet file available? |
Ah, it seems that helps explain part of it. Long story short, I was using a file produced by arrow2. In a single column, the first As a result of this, This is ultimately passed to the code below, which will always try to slice dict_values which potentially may be zero-length. arrow2/src/io/parquet/read/deserialize/binary/basic.rs Lines 331 to 340 in 9a38663
I am not sure why If I am interpreting this correctly, there is no guard for a malformed parquet file with an empty dictionary but provided values. |
I found an issue where a small dictionary offset (length of 1) in
arrow2::io::parquet::read::deserialize::binary::basic::State::OptionalDictionary
causes the parquet reader to panic.I imagine restricting the slice bound is a fix. In general, there doesn't appear to be a way to propagate errors without some API changes:
I assume that we should modify the return values of
extend_from_state
andextend_from_new_page
?arrow2/src/io/parquet/read/deserialize/binary/basic.rs
Lines 331 to 340 in 9a38663
The text was updated successfully, but these errors were encountered: