Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading byte stream split encoded Parquet data #17042

Closed
adamreeve opened this issue Jun 18, 2024 · 0 comments · Fixed by #17099
Closed

Support reading byte stream split encoded Parquet data #17042

adamreeve opened this issue Jun 18, 2024 · 0 comments · Fixed by #17099
Labels
enhancement New feature or an improvement of an existing feature needs decision Awaiting decision by a maintainer

Comments

@adamreeve
Copy link
Contributor

adamreeve commented Jun 18, 2024

Description

Parquet's byte stream split encoding can be very useful for compactly storing floating point data (eg. see the "(reference) Float32 data" section in https://issues.apache.org/jira/browse/PARQUET-2414 for some measurements), so it would be great if Polars supported reading files that use this encoding.

There is an issue open against the parquet2 repository for this feature (jorgecarleitao/parquet2#208), but given Polars has forked this into its own polars-parquet crate I'm making an issue here now.

I'm happy to implement this feature if it would be accepted, and have made some progress getting the basic functionality working already.

@adamreeve adamreeve added the enhancement New feature or an improvement of an existing feature label Jun 18, 2024
@deanm0000 deanm0000 added the needs decision Awaiting decision by a maintainer label Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature needs decision Awaiting decision by a maintainer
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants