Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write Multiple RecordBatch to Parquet Row Group #1211

Closed
tustvold opened this issue Jan 19, 2022 · 0 comments · Fixed by #1214
Closed

Write Multiple RecordBatch to Parquet Row Group #1211

tustvold opened this issue Jan 19, 2022 · 0 comments · Fixed by #1214
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@tustvold
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Parquet row groups are meant to contain large numbers of rows. This helps amortize statistics, metadata, and IO overheads, and make the best use of dictionary encoding.

Currently every call to ArrowWriter::write creates a new row group, this is unfortunate

Describe the solution you'd like

ArrowWriter should only close a row group once it exceeds the configured WriterProperties::max_row_group_size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
1 participant