Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize object_store integration to all supported cloud providers #5959

Closed
winding-lines opened this issue Dec 30, 2022 · 6 comments
Closed
Labels
enhancement New feature or an improvement of an existing feature

Comments

@winding-lines
Copy link
Contributor

Problem description

The current integration of the object_store crate enables parquet download from S3. Generalize the integration so that parquet files can be downloaded from any of the supported object_store cloud providers.

Change the python API to defer to cloud download operations to the rust side for the supported cloud providers. The rationale for this is that the Polars planner can optimize the downloads in some important use cases:

  1. for parquet files it can download only the columns of interests
  2. it can leverage per column stats to further reduce downloads
  3. if can leverage the directory structure when the hive format is used (/field=value/)
@winding-lines winding-lines added the enhancement New feature or an improvement of an existing feature label Dec 30, 2022
@winding-lines
Copy link
Contributor Author

This is running in some small issues here apache/arrow-rs#3419

@talawahtech
Copy link

Hi, awesome feature! I see that it has made it's way into the rust codebase already, is this the issue to watch for Python API support, or is there a separate ticket for that?

@winding-lines
Copy link
Contributor Author

In progress, no ticket. There were some limitations in the GCP interface on the object_store side, so I got sidetracked for the last 2 weekends.

@talawahtech
Copy link

Gotcha, thanks for the update. This a super powerful/useful feature. Looking forward to trying it out on the Python side.

@winding-lines
Copy link
Contributor Author

Made a bit more progress on this, see #6426

@stinodego
Copy link
Contributor

This has been implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants