Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs on write_delta parameter delta_write_options to show how to pass WriterProperties #20739

Open
2 tasks done
lmmx opened this issue Jan 16, 2025 · 2 comments · May be fixed by #20746
Open
2 tasks done

Docs on write_delta parameter delta_write_options to show how to pass WriterProperties #20739

lmmx opened this issue Jan 16, 2025 · 2 comments · May be fixed by #20746
Labels
bug Something isn't working documentation Improvements or additions to documentation good first issue Good for newcomers P-medium Priority: medium python Related to Python Polars

Comments

@lmmx
Copy link

lmmx commented Jan 16, 2025

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

The docs for write_delta()

delta_write_options
Additional keyword arguments while writing a Delta lake Table. See a list of supported write options here.

say that “keyword arguments” is the correct way to pass arguments to write_deltalake:

df.write_delta("zstd_delta", delta_write_options={"compression": "zstd"})

However this gives an error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-2d96e3d0e17b> in <cell line: 0>()
----> 1 df.write_delta("zstd_delta", delta_write_options={"compression": "zstd"})

/usr/local/lib/python3.11/dist-packages/polars/dataframe/frame.py in write_delta(self, target, mode, overwrite_schema, storage_options, delta_write_options, delta_merge_options)
   4496 
   4497             schema = delta_write_options.pop("schema", None)
-> 4498             write_deltalake(
   4499                 table_or_uri=target,
   4500                 data=data,

TypeError: write_deltalake() got an unexpected keyword argument 'compression'

You can see it is passing to write_deltalake, which is documented here.

This works:

props = deltalake.WriterProperties(compression="zstd")
df.write_delta("zstd_delta", delta_write_options={"writer_properties": props})

Log output

No response

Issue description

Either the user needs to instantiate the deltalake.WriterProperties object or (preferably I think) Polars needs to do so with those passed keyword arguments.

Edit - no change to Polars is needed upon closer re-inspection, docs would help clarify this though!

Expected behavior

This should work:

df.write_delta("zstd_delta", delta_write_options={"compression": "zstd"})

Installed versions

I ran this on colab, and saved it as a gist here for reference.


--------Version info---------
Polars:              1.19.0
Index type:          UInt32
Platform:            Linux-6.1.85+-x86_64-with-glibc2.35
Python:              3.11.11 (main, Dec  4 2024, 08:55:07) [GCC 11.4.0]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               5.5.0
azure.identity       <not installed>
boto3                <not installed>
cloudpickle          3.1.0
connectorx           <not installed>
deltalake            0.24.0
fastexcel            <not installed>
fsspec               2024.10.0
gevent               <not installed>
google.auth          2.27.0
great_tables         <not installed>
matplotlib           3.10.0
nest_asyncio         1.6.0
numpy                1.26.4
openpyxl             3.1.5
pandas               2.2.2
pyarrow              17.0.0
pydantic             2.10.5
pyiceberg            <not installed>
sqlalchemy           2.0.37
torch                2.5.1+cu121
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@lmmx lmmx added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jan 16, 2025
@lmmx lmmx changed the title Incorrect documentation on use of write_delta parameters to write_options Incorrect documentation on use of write_delta parameter delta_write_options Jan 16, 2025
@lmmx lmmx changed the title Incorrect documentation on use of write_delta parameter delta_write_options Incorrect docs on write_delta parameter delta_write_options Jan 16, 2025
@ion-elgreco
Copy link
Contributor

@lmmx could you put in a fix to update the docs?

@nameexhaustion nameexhaustion added documentation Improvements or additions to documentation good first issue Good for newcomers P-medium Priority: medium and removed needs triage Awaiting prioritization by a maintainer labels Jan 16, 2025
@github-project-automation github-project-automation bot moved this to Ready in Backlog Jan 16, 2025
@lmmx
Copy link
Author

lmmx commented Jan 16, 2025

I must apologise, while a little confusing the docs are actually not incorrect, I hadn’t had my morning coffee and mistook it! I’ve filed a PR to demonstrate how to change the parquet compression format though (I still believe this would be helful, I found it a bit obscured to spot that this required an external import at a glance)

@lmmx lmmx changed the title Incorrect docs on write_delta parameter delta_write_options Docs on write_delta parameter delta_write_options to show how to pass WriterProperties Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation good first issue Good for newcomers P-medium Priority: medium python Related to Python Polars
Projects
Status: Ready
Development

Successfully merging a pull request may close this issue.

3 participants