You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Writing an arrow record batch with structs nested within lists using the parquet writer produces a parquet file with incorrect values when there are null or empty lists present.
To Reproduce
The following program produces a parquet file out.parquet.
Running parquet-dump on out.parquet produces the following output
value 1: R:0 D:4 V:1
value 2: R:0 D:1 V:<null>
value 3: R:0 D:0 V:<null>
value 4: R:0 D:2 V:<null>
value 5: R:1 D:2 V:<null>
value 6: R:0 D:3 V:<null>
value 7: R:0 D:4 V:0
Expected behavior
The last value (value 7) should have been a 2
value 1: R:0 D:4 V:1
value 2: R:0 D:1 V:<null>
value 3: R:0 D:0 V:<null>
value 4: R:0 D:2 V:<null>
value 5: R:1 D:2 V:<null>
value 6: R:0 D:3 V:<null>
value 7: R:0 D:4 V:2
Describe the bug
Writing an arrow record batch with structs nested within lists using the parquet writer produces a parquet file with incorrect values when there are null or empty lists present.
To Reproduce
The following program produces a parquet file
out.parquet
.Running
parquet-dump
onout.parquet
produces the following outputExpected behavior
The last value (value 7) should have been a 2
Additional context
filter_array_indices
function in https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/levels.rs#L760 produces incorrect indices when the immediate parent of a field is not a list. In the writer https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/arrow_writer.rs#L244, those indices are then used to produce the values to write at https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/arrow_writer.rs#L284 causing the incorrect behavior described above.The text was updated successfully, but these errors were encountered: