Support reading byte stream split encoded Parquet data #17042
Labels
enhancement
New feature or an improvement of an existing feature
needs decision
Awaiting decision by a maintainer
Description
Parquet's byte stream split encoding can be very useful for compactly storing floating point data (eg. see the "(reference) Float32 data" section in https://issues.apache.org/jira/browse/PARQUET-2414 for some measurements), so it would be great if Polars supported reading files that use this encoding.
There is an issue open against the parquet2 repository for this feature (jorgecarleitao/parquet2#208), but given Polars has forked this into its own polars-parquet crate I'm making an issue here now.
I'm happy to implement this feature if it would be accepted, and have made some progress getting the basic functionality working already.
The text was updated successfully, but these errors were encountered: