Support null values when reading from Parquet files #1740

leesavoie-voltaiq · 2021-11-10T17:00:26Z

If I run code like the following on a Parquet file that contains nulls, I get an error:

import polars as pl
pqt_file = <path to a Parquet file containing nulls>
pl.scan_parquet(pqt_file).select(pl.col("*")).collect()

The error is Any(ArrowError(NotYetImplemented("Reading Null from parquet still not implemented"))).

If I instead read the data using PyArrow first, I get a different error:

import polars as pl
import pyarrow.dataset as ds

pqt_file = <path to a Parquet file containing nulls>
data = ds.dataset(pqt_file)
df = pl.from_arrow(data.to_table())
df.lazy().select(pl.col("*")).collect()

In this case, the error is InvalidArgumentError("all columns in a record batch must have the same length"), though I suspect the underlying issue is the same. This only seems to happen when using the lazy API; if I read the file using pl.read_parquet it seems to work fine.

Is reading nulls from Parquet expected to be implemented any time soon?

The text was updated successfully, but these errors were encountered:

jorgecarleitao · 2021-11-11T21:50:06Z

Fixed upstream: jorgecarleitao/arrow2#598

Note that the issue is not reading nulls in general, but reading the "null" logical data type, a type representing a column of only nulls.

leesavoie-voltaiq · 2021-11-11T23:18:47Z

Thanks, I didn't realize this only happens when the column type is null. That helps. I'll keep an eye on your PR.

leesavoie-voltaiq · 2021-11-12T23:31:30Z

I can confirm that this is now working for me as of Polars 0.10.19 on pypi.org. Thanks for the quick turnaround.

ritchie46 mentioned this issue Nov 12, 2021

update arrow + python 0.10.19 #1747

Merged

ritchie46 closed this as completed in #1747 Nov 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support null values when reading from Parquet files #1740

Support null values when reading from Parquet files #1740

leesavoie-voltaiq commented Nov 10, 2021

jorgecarleitao commented Nov 11, 2021

leesavoie-voltaiq commented Nov 11, 2021

leesavoie-voltaiq commented Nov 12, 2021

Support null values when reading from Parquet files #1740

Support null values when reading from Parquet files #1740

Comments

leesavoie-voltaiq commented Nov 10, 2021

jorgecarleitao commented Nov 11, 2021

leesavoie-voltaiq commented Nov 11, 2021

leesavoie-voltaiq commented Nov 12, 2021