-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hdfs support #300
Comments
@houqp I did some search on HDFS libraries for Rust and found this crate: https://crates.io/crates/fs-hdfs . But it seems to have a lot of dependencies to run. Do you have any suggestions on this? |
@yjshen wrote a wrapper for libhdfs3: https://github.com/datafusion-contrib/datafusion-hdfs-native. This one is a lot leaner since it only has a c++ dependency. Perhaps you can work with him to convert that binding into its own crate? Right now it's coupled with the datafusion hdfs object store implementation. |
Good to know that. @yjshen do you have time to convert your work into a new crate? It would be very helpful for other Rust projects too. |
@houqp Do you think datafusion-contrib is the right place to hold this hdfs rust repo? or should I make it under my account? |
@yjshen up to you, since you are the author :) |
Hey @yjshen , any update on this? Currently we are using MinIO on HDFS as a workaround. But it seems to be not a sustainable way: minio/minio#13927 . We are all counting on you now :) |
@zijie0 let's cooperate here : https://github.com/datafusion-contrib/hdfs-native |
Cool! @yjshen |
Sorry everyone I realized that this feature might not be trivial to support.
But this will open up a whole new bag of worms that I don't think is good for any project to experience. Not to mention both approaches will increase the installation complexity to end-users (that most users probably would not be too experienced with). A workaround is to mount HDFS and access it like a regular filesystem and allow delta-rs to access hdfs this way though this is just a suggestion. |
I find that the solution by @yjshen is a really sound one but installation will likely differ across systems. |
Hi @mingruimingrui , I've met with a similar problem, a customized HDFS version similar to yours. To make it worse, we even use HDFS with federation that isn't supported by native CPP implementations. Since the motivation for me to implement |
Yeah, unfortunately for custom setup, a custom build will be needed for native applications. I am guessing clickhouse has the same problem as well. |
Yes, that is true for ClickHouse. For now, our hosted ClickHouse cluster can only use one single HDFS NameNode. Lack the capability to use federated HDFS. |
Since Datafusion has implemented https://github.com/datafusion-contrib/datafusion-objectstore-hdfs. Does it help delta to support HDFS? |
Yes. It looks like they have complete read support. But write support isn't incomplete. Someone could integrate that into this package. |
Description
HDFS storage support.
Use Case
A significant portion of companies dealing with big data uses HDFS as the backend storage solution of choice for long-term persistent data storage and processing. Having this would be very beneficial for the place I currently work at.
The text was updated successfully, but these errors were encountered: