Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make table partitioning accessible in register/read APIS #1185

Closed
5 tasks
rdettai opened this issue Oct 28, 2021 · 5 comments
Closed
5 tasks

Make table partitioning accessible in register/read APIS #1185

rdettai opened this issue Oct 28, 2021 · 5 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@rdettai
Copy link
Contributor

rdettai commented Oct 28, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The ability to read partitioned tables is added the the ListingTable in #1141. This is a very common feature that would be interesting to expose in the highest level APIs such as ExecutionContext.read_xxx and ExecutionContext.register_xxx

Describe the solution you'd like
The cleanest solution would probably to have a partiotioning field in CsvReadOptions/AvroReadOptions and also make read_parquet use a ParquetReadOptions

@jimexist
Copy link
Member

@rdettai i closed #1220 and maybe we can think of a way to adapt two styles

@Igosuki
Copy link
Contributor

Igosuki commented Dec 15, 2021

Hi guys, coming back to this now, since we can already find partition in paths in the object store, and looking at the API, I think what is missing is the ability to infer ListingOptions::table_partition_cols from the base path given to the TableProvider.
The table provider should then be able to easily merge the two schemas (the object store paths and the file format) for the exec.
Am I getting this right ?

@houqp
Copy link
Member

houqp commented Dec 20, 2021

@Igosuki I think the idea is to extend these public apis to allow users explicitly specify table partition columns (i.e. the table_partition_cols field) using either dataframe or sql interface. The auto partitioning discovery feature can be enabled when no explicit partitioning is specified by the user.

I think there is value in making partition column statically typed as well instead of only treating them as strings. But this is out of scope for this issue.

@houqp houqp added good first issue Good for newcomers help wanted Extra attention is needed labels Dec 20, 2021
@rdettai
Copy link
Contributor Author

rdettai commented Dec 26, 2021

@houqp not sure I would tag this as good first issue as it requires to change the API and a fair amount of plumbing (i.e touch quite a few files)

@houqp houqp removed the good first issue Good for newcomers label Dec 26, 2021
@alamb
Copy link
Contributor

alamb commented Feb 3, 2025

I believe ListingOptions::table_partition_cols now implements what is described in this ticke so closing and claiming success

https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingOptions.html#structfield.table_partition_cols

@alamb alamb closed this as completed Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants