Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

invalid written parquet file of nested structures. (Mixing list with structs) #1325

Closed
ritchie46 opened this issue Dec 10, 2022 · 2 comments · Fixed by #1347
Closed

invalid written parquet file of nested structures. (Mixing list with structs) #1325

ritchie46 opened this issue Dec 10, 2022 · 2 comments · Fixed by #1347
Labels
bug Something isn't working

Comments

@ritchie46
Copy link
Collaborator

Arrow2 can read the written file, but arrow fails with:

OSError: Malformed levels. min: 0 max: 3 out of range.  Max Level: 2

MWE

import polars as pl
import pyarrow.parquet as pq

pl.from_records(
    [
        dict(
            id=1,
            list_of_structs_col=[
                dict(a=10, b=[10, 11, 12]),
                dict(a=11, b=[13, 14, 15]),
            ],
        ),
        dict(
            id=2,
            list_of_structs_col=[
                dict(a=44, b=[12]),
            ],
        ),
    ]
).write_parquet("/tmp/out.parquet")

print(pl.read_parquet("/tmp/out.parquet"))  # succeeds
print(df = pq.read_table("/tmp/out.parquet"))  # fails
@ritchie46 ritchie46 added the bug Something isn't working label Dec 11, 2022
@ritchie46 ritchie46 changed the title pyarrow cannot read file written by arrow2. invalid written parquet file of nested structures. Dec 11, 2022
@ritchie46
Copy link
Collaborator Author

Also noted that though arrow2 can read the file, the content that is read is incorrect.

@ritchie46 ritchie46 changed the title invalid written parquet file of nested structures. invalid written parquet file of nested structures. (Mixing list with structs) Dec 11, 2022
@ritchie46
Copy link
Collaborator Author

The issue is the in the repetition levels.

They are [0, 3, 3, 1, 3, 3, 0, ]. And should be [0, 2, 2, 1, 2, 2, 0, ].

Still have to figure out how the conversion works to be able to fix it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant