-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ObjectStore write support #2185
Comments
related to #2025. cc @matthewmturner |
I wonder what quality of write support do you plan to provide ? Production ready implementation of data ingestion can be as large effort as having to create another project like Apache Kafka. |
thanks, @xudong963 and @wjones127. very happy to see this. also relates to #1777. |
Basically, I would like for
This is just the "filesystem" interaction, so just reading and writing bytes to various places with a uniform API. Other "writer" related things like file formats (parquet / json / csv) would be out of scope. Does that make sense? |
First of all I am just a stranger that evaluates datafusion query engine, I might lack some context so my point might not be valid for this case. Yes sure that make sense. From what I see writer API adds point of failure to the upstream. For example how is It going to deal with data loss in case of process crash or missing permissions for write to the s3 bucket, etc... ? ObjectStore that just performs reads cannot corrupt datasource and from my perspective that is great. I would suggest to push this cross FS implementation into Rust Arrow repository same as C++ did then implementation would be even more reusable. |
This issue is a little out of date. We recently switched to a new object store crate and it appears to support writes. https://docs.rs/object_store/0.5.0/object_store/trait.ObjectStore.html |
Yes! I'm happy to close this, and other issues can be files for any further integration work. |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We are looking at improving the filesystem / object store support in delta-rs, but it seems like it would be better to work on that inside of datafusion's data-access crate instead of doing all that work in delta-rs. delta-rs currently has file system support for local fs, gcs, s3, and adls, with just reading and write whole files. I think we'll want to add streaming reads and writes.
Describe the solution you'd like
Design and implement a streaming write interface into the
ObjectStore
trait.Describe alternatives you've considered
We could do that work in delta-rs and then contribute it back here later. But it might not transfer well. For example, the current delta-rs S3 filesystem use rusoto, while the datafusion object store uses the AWS SDK.
The text was updated successfully, but these errors were encountered: