Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust): make PartitionWriter public #2525

Merged
merged 3 commits into from
May 19, 2024

Conversation

adriangb
Copy link
Contributor

I would like to use PartitionWriter within my own crate. Would you mind making this public so I don't have to vendor it and all of the associated code?

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label May 18, 2024
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@adriangb adriangb changed the title Make PartitionWriter public feat(rust): Make PartitionWriter public May 18, 2024
@rtyler rtyler changed the title feat(rust): Make PartitionWriter public feat(rust): make PartitionWriter public May 19, 2024
@rtyler
Copy link
Member

rtyler commented May 19, 2024

@adriangb I'm not opposed to the idea, can you share more about how this can be used outside of the delta context, I'm curious

@adriangb
Copy link
Contributor Author

adriangb commented May 19, 2024

This isn't to use outside of the delta context necessarily, just hook in more low level and with less code churn.

I have a Rust service that does buffering of data before writing to Delta. I control all of the writes going to Delta so I want to avoid all of the overhead of checking that the schemas match, etc which happens if you go through the public APIs. I also want to take control of the write retry loop because I may have to switch to pessimistic concurrency control under high write load (take out a lock on a low latency system like Redis only at the stage of writing the commit) to avoid write contention.

Currently data is written out manually using ArrowWriter as hive partitioned parquet. So if I can hook into PartitionWriter it's relatively little code changes (actually saves me a good bit of code doing the partitioning) and I can then go implement the writing of the commits as a later step to convert to a delta table.

@rtyler rtyler enabled auto-merge (rebase) May 19, 2024 15:35
@rtyler
Copy link
Member

rtyler commented May 19, 2024

Sounds reasonable to me! I think this API is stable enough to make public in the next release, please run a cargo fmt and then this should merge

rtyler
rtyler previously approved these changes May 19, 2024
auto-merge was automatically disabled May 19, 2024 15:51

Head branch was pushed to by a user without write access

@adriangb
Copy link
Contributor Author

adriangb commented May 19, 2024

done @rtyler ! Seems like it dismissed your review :(

@ion-elgreco ion-elgreco enabled auto-merge (squash) May 19, 2024 15:54
@roeap
Copy link
Collaborator

roeap commented May 19, 2024

@rtyler @adriangb - recently I have been thinking if we should adopt the visibility crate to expose a larger api behind a feature gate and less guarantees. We uses this - so gar successfully I believe- in kernel ...

There also is some writing stuff - e.g. muti-part put from object_store on my todo list, that may affect this.

@adriangb
Copy link
Contributor Author

@roeap not sure I understood fully, are you saying that this right now should not be made public or that there is an opportunity for a more thoughtful approach to a public API within the rust crates in the future?

@ion-elgreco
Copy link
Collaborator

@roeap best to wait for object store 0.10 before working on the multi part

@aersam also already put some work there

@ion-elgreco ion-elgreco merged commit ea9c1b9 into delta-io:main May 19, 2024
40 checks passed
@roeap
Copy link
Collaborator

roeap commented May 19, 2024

@adriangb - i think we can make this public now, but maybe behind a feature gate. The visibility crate enables this.

@ion-elgreco - yes, still some things to cross off first, and we'll work from where we are, when i get to it 🙂.

@adriangb adriangb deleted the make-partition-writer-public branch May 19, 2024 16:16
@adriangb
Copy link
Contributor Author

i think we can make this public now, but maybe behind a feature gate. The visibility crate enables this.

Interesting, TIL!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants