feat(python): add parameter to DeltaTable.to_pyarrow_dataset() #2465

adriangb · 2024-04-30T18:09:58Z

Otherwise there is no way to union this with another dataset.

ion-elgreco · 2024-04-30T18:18:30Z

@adriangb you might just better passthrough as_large_types to the .to_pyarrow()

adriangb · 2024-04-30T18:40:58Z

They both seem useful right? It seems like the as_large_types just blindly makes all types large. I could have a mix of small and large types of which I know the schema, so passing in the schema explicitly (given that it's simple to do so) seems worth having as an option.

ion-elgreco · 2024-04-30T20:25:51Z

@adriangb that's true. If you can fix the tests then we can merge

adriangb · 2024-04-30T20:43:25Z

it looks like the test just fails on older pyarrow versions and only for the map type. How about I split it in two and skip the failing one on pyarrow < 10?

ion-elgreco · 2024-05-04T10:16:43Z

@adriangb can you fix the tests? Then we can merge it :)

ion-elgreco · 2024-05-05T08:42:25Z

python/deltalake/table.py

@@ -1022,6 +1022,8 @@ def to_pyarrow_dataset(
        partitions: Optional[List[Tuple[str, str, Any]]] = None,
        filesystem: Optional[Union[str, pa_fs.FileSystem]] = None,
        parquet_read_options: Optional[ParquetReadOptions] = None,
+        schema: Optional[pyarrow.Schema] = None,
+        as_large_types: bool = False,


Doc description is missing for this param. I would also mention if the schema is passed that takes precedence over as_large_types

ion-elgreco · 2024-05-05T22:01:21Z

Thankss @adriangb

adriangb requested review from wjones127, fvaleye, roeap and ion-elgreco as code owners April 30, 2024 18:09

github-actions bot added the binding/python Issues for the Python package label Apr 30, 2024

ion-elgreco requested changes May 5, 2024

View reviewed changes

ion-elgreco approved these changes May 5, 2024

View reviewed changes

ion-elgreco enabled auto-merge (squash) May 5, 2024 22:01

adriangb added 5 commits May 6, 2024 00:01

feat(python): add parameter to DeltaTable.to_pyarrow_dataset()

c0ed379

skip on old pyarrow

2003fe5

Update test_writer.py

d66d718

fix tests

83cdd5f

Add note

80892a6

ion-elgreco force-pushed the read-schema branch from 63a4b94 to 80892a6 Compare May 5, 2024 22:01

ion-elgreco merged commit d0617b5 into delta-io:main May 5, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python): add parameter to DeltaTable.to_pyarrow_dataset() #2465

feat(python): add parameter to DeltaTable.to_pyarrow_dataset() #2465

adriangb commented Apr 30, 2024 •

edited

Loading

ion-elgreco commented Apr 30, 2024

adriangb commented Apr 30, 2024

ion-elgreco commented Apr 30, 2024

adriangb commented Apr 30, 2024

ion-elgreco commented May 4, 2024

ion-elgreco May 5, 2024

adriangb May 5, 2024

ion-elgreco commented May 5, 2024

feat(python): add parameter to DeltaTable.to_pyarrow_dataset() #2465

feat(python): add parameter to DeltaTable.to_pyarrow_dataset() #2465

Conversation

adriangb commented Apr 30, 2024 • edited Loading

ion-elgreco commented Apr 30, 2024

adriangb commented Apr 30, 2024

ion-elgreco commented Apr 30, 2024

adriangb commented Apr 30, 2024

ion-elgreco commented May 4, 2024

ion-elgreco May 5, 2024

Choose a reason for hiding this comment

adriangb May 5, 2024

Choose a reason for hiding this comment

ion-elgreco commented May 5, 2024

adriangb commented Apr 30, 2024 •

edited

Loading