-
Notifications
You must be signed in to change notification settings - Fork 224
Slicing nullable list arrays into multiple parquet pages doesn't work #1356
Comments
It looks like the issue has to do with the definition level encoder. When written as a single page the reps, defs, and values are as follows
but when written as multiple pages you get
|
This commit tjwilson90@f8b0cca appears to partially address the issue, but it doesn't fix all the problems I'm seeing in the non-minimized application I'm attempting to upgrade that makes lists of structs containing strings. One part that looks quite suspicious to me but I haven't figured out how to correct is how |
A similar problem exists with
|
arrow version 0.15
outputs
but should output
While debugging this in the context in which I originally found it, it appeared that the data page headers in the written parquet file were incorrect, so I'm pretty sure the problem is with writing, not reading. I'm pretty confident it's caused somehow by partitioning nullable list columns into multiple data pages since increasing
data_pagesize_limit
to a force only a single page to be created avoids the issue.@ritchie46
The text was updated successfully, but these errors were encountered: