Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45891][SQL] Rebuild variant binary from shredded data. #48851

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

chenhao-db
Copy link
Contributor

What changes were proposed in this pull request?

It implements the variant rebuild functionality according to the current shredding spec in apache/parquet-format#461, and allows the Parquet reader will be able to read shredded variant data.

Why are the changes needed?

It gives Spark the basic ability to read shredded variant data. It can be improved in the future to read only requested fields.

Does this PR introduce any user-facing change?

Yes, the Parquet reader will be able to read shredded variant data.

How was this patch tested?

Unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@chenhao-db
Copy link
Contributor Author

@gene-db @cashmand @cloud-fan could you help review? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant