Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified API to write files #78

Merged
merged 6 commits into from
Feb 2, 2022
Merged

Simplified API to write files #78

merged 6 commits into from
Feb 2, 2022

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Jan 21, 2022

This simplifies the API to write a file by giving the user more control over when to emit a Row group to write.

Thanks to /u/dexterduck for the feedback on reddit that led to this PR!

@codecov-commenter
Copy link

codecov-commenter commented Jan 21, 2022

Codecov Report

Merging #78 (e24554b) into main (2194ee4) will decrease coverage by 1.54%.
The diff coverage is 2.35%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #78      +/-   ##
==========================================
- Coverage   69.30%   67.76%   -1.55%     
==========================================
  Files          66       68       +2     
  Lines        3691     3800     +109     
==========================================
+ Hits         2558     2575      +17     
- Misses       1133     1225      +92     
Impacted Files Coverage Δ
integration-tests/src/lib.rs 0.00% <0.00%> (ø)
src/write/file.rs 45.16% <0.00%> (-54.84%) ⬇️
src/write/stream.rs 0.00% <0.00%> (ø)
src/read/mod.rs 93.72% <100.00%> (-0.06%) ⬇️
src/write/mod.rs 92.40% <100.00%> (-1.89%) ⬇️
src/encoding/hybrid_rle/mod.rs 86.36% <0.00%> (-1.57%) ⬇️
src/lib.rs 78.62% <0.00%> (-1.54%) ⬇️
src/encoding/hybrid_rle/decoder.rs 94.11% <0.00%> (+1.96%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2194ee4...e24554b. Read the comment docs.

Copy link

@sydduckworth sydduckworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Would be convenient to have an interface for writing Parquet files that implements Sink but I'd be happy to put in an issue for that since it's likely out of scope of this PR.

@jorgecarleitao
Copy link
Owner Author

Thanks @dexterduck ! You mean futures::Sink?

@sydduckworth
Copy link

@jorgecarleitao yep! It's a relatively minor thing but it would be nice to have an interface that implements futures::Sink<RowGroupIter<'_, _>> where closing the sink writes the Parquet footer and then closes the underlying writer.
Kind of provides the best of both worlds since you can write row by row if needed, but you can also use futures::StreamExt::forward to fully write out a stream of row groups to a file in a single call.

@jorgecarleitao jorgecarleitao merged commit dde40a6 into main Feb 2, 2022
@jorgecarleitao jorgecarleitao deleted the writer branch February 2, 2022 06:33
@jorgecarleitao
Copy link
Owner Author

Thanks for your review, @dexterduck and suggestion. I agree. Would you be able to work on the Sink? I am not familiar with that API. It would be easier for someone that is familiar with it.

@sydduckworth
Copy link

Sure, I'll give it a shot and put a PR in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants