Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema merging doesn't work when overwriting with a predicate #2567

Closed
polivbr opened this issue Jun 3, 2024 · 2 comments · Fixed by #2623
Closed

schema merging doesn't work when overwriting with a predicate #2567

polivbr opened this issue Jun 3, 2024 · 2 comments · Fixed by #2623
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate bug Something isn't working

Comments

@polivbr
Copy link

polivbr commented Jun 3, 2024

Environment

Delta-rs version:

0.17.4

Binding:

Python


Bug

What happened:

I attempted to update a table from a Polars DataFrame with mode="overwrite" and a predicate to use for replacement. The DataFrame had a subset of the columns that are in the table. While the rows matching the predicate are successfully replaced with the new data, the table's schema becomes the schema of the DataFrame, rather than being merged with the existing schema.

What you expected to happen:

The original table schema is preserved.

How to reproduce it:

  1. Create a table with a set of columns
  2. Write to that same table with:
    • mode="overwrite"
    • schema_mode="merge"
    • a replacement predicate provided
    • a DataFrame containing a subset of the columns in the table
@polivbr polivbr added the bug Something isn't working label Jun 3, 2024
@ion-elgreco
Copy link
Collaborator

@polivbr please create a reproducible example

@polivbr
Copy link
Author

polivbr commented Jun 3, 2024

Here you go:

import polars as pl
import deltalake as dl

df = pl.DataFrame({'a': [1, 2, 3, 4], 'b': [1, 1, 2, 2], 'c': [10, 11, 12, 13]})

df.write_delta("test_table")

df2 = pl.DataFrame({'a': [100, 200, 300], 'b': [1, 1, 1]})

df2.write_delta(
    "test_table",
    mode="overwrite",
    delta_write_options={
        "predicate": "b = 1",
        "schema_mode": "merge",
        "engine": "rust"
    }
)

table = dl.DeltaTable("test_table")
schema = table.schema()

print(schema)

# OUTPUT:
# Schema([Field(a, PrimitiveType("long"), nullable=True), Field(b, PrimitiveType("long"), nullable=True)])
#
# Note that Field c is absent

@rtyler rtyler added the binding/python Issues for the Python package label Jun 3, 2024
@ion-elgreco ion-elgreco added the binding/rust Issues for the Rust crate label Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants