Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support null values when reading from Parquet files #1740

Closed
leesavoie-voltaiq opened this issue Nov 10, 2021 · 3 comments · Fixed by #1747
Closed

Support null values when reading from Parquet files #1740

leesavoie-voltaiq opened this issue Nov 10, 2021 · 3 comments · Fixed by #1747

Comments

@leesavoie-voltaiq
Copy link

If I run code like the following on a Parquet file that contains nulls, I get an error:

import polars as pl
pqt_file = <path to a Parquet file containing nulls>
pl.scan_parquet(pqt_file).select(pl.col("*")).collect()

The error is Any(ArrowError(NotYetImplemented("Reading Null from parquet still not implemented"))).

If I instead read the data using PyArrow first, I get a different error:

import polars as pl
import pyarrow.dataset as ds

pqt_file = <path to a Parquet file containing nulls>
data = ds.dataset(pqt_file)
df = pl.from_arrow(data.to_table())
df.lazy().select(pl.col("*")).collect()

In this case, the error is InvalidArgumentError("all columns in a record batch must have the same length"), though I suspect the underlying issue is the same. This only seems to happen when using the lazy API; if I read the file using pl.read_parquet it seems to work fine.

Is reading nulls from Parquet expected to be implemented any time soon?

@jorgecarleitao
Copy link
Collaborator

Fixed upstream: jorgecarleitao/arrow2#598

Note that the issue is not reading nulls in general, but reading the "null" logical data type, a type representing a column of only nulls.

@leesavoie-voltaiq
Copy link
Author

Thanks, I didn't realize this only happens when the column type is null. That helps. I'll keep an eye on your PR.

@leesavoie-voltaiq
Copy link
Author

I can confirm that this is now working for me as of Polars 0.10.19 on pypi.org. Thanks for the quick turnaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants