Parquet writes incorrect `List<u32>` #1368

ritchie46 · 2023-01-18T09:52:25Z

Add the boundary of 349_526 rows with 349_525 nulls and the last value specified the parquet file that is written is incorrect.

This seems to also be related to the row groups size: see original issue report: pola-rs/polars#6289

The most minimal example I could make is:

f = io.BytesIO()
df = pl.Series('a', [*[None]*349_525, [1, 2]], dtype=pl.List(pl.UInt32)).to_frame()
print(df.tail(1))

f.seek(0)
df.write_parquet(f)
f.seek(0)
print(pl.read_parquet(f).tail(1))  # we expect the same `[1, 2]` here, but we get `[null, null]`

shape: (1, 1)
┌───────────┐
│ a         │
│ ---       │
│ list[u32] │
╞═══════════╡
│ [1, 2]    │
└───────────┘
shape: (1, 1)
┌──────────────┐
│ a            │
│ ---          │
│ list[u32]    │
╞══════════════╡
│ [null, null] │
└──────────────┘

The state of the df is:

print(df)

shape: (349526, 1)
┌───────────┐
│ a         │
│ ---       │
│ list[u32] │
╞═══════════╡
│ null      │
│ null      │
│ null      │
│ null      │
│ ...       │
│ null      │
│ null      │
│ null      │
│ [1, 2]    │
└───────────┘

When we use the pyarrow backend for writing the output is as expected.

The text was updated successfully, but these errors were encountered:

ritchie46 added the bug Something isn't working label Jan 18, 2023

ritchie46 changed the title ~~Parquet writes incorrect List<u32>~~ Parquet writes incorrect List<u32> Jan 18, 2023

ritchie46 mentioned this issue Jan 19, 2023

fix(python): default to pyarrow for writing parquet pola-rs/polars#6313

Merged

tustvold mentioned this issue Feb 8, 2023

RFC: Use Apache Arrow Parquet Crate pola-rs/polars#6735

Closed

jorgecarleitao mentioned this issue Feb 10, 2023

Fixed writing nested parquet #1390

Merged

jorgecarleitao closed this as completed in #1390 Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parquet writes incorrect `List<u32>` #1368

Parquet writes incorrect `List<u32>` #1368

ritchie46 commented Jan 18, 2023 •

edited

Loading

Parquet writes incorrect List<u32> #1368

Parquet writes incorrect List<u32> #1368

Comments

ritchie46 commented Jan 18, 2023 • edited Loading

Parquet writes incorrect `List<u32>` #1368

Parquet writes incorrect `List<u32>` #1368

ritchie46 commented Jan 18, 2023 •

edited

Loading