Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deletion _change_type does not appear in change data feed #2579

Closed
Sirz3chs opened this issue Jun 7, 2024 · 3 comments · Fixed by #2721
Closed

Deletion _change_type does not appear in change data feed #2579

Sirz3chs opened this issue Jun 7, 2024 · 3 comments · Fixed by #2721
Labels
binding/python Issues for the Python package enhancement New feature or request

Comments

@Sirz3chs
Copy link

Sirz3chs commented Jun 7, 2024

Environment

Delta-rs version: 0.18.0

Binding: Python


Bug

What happened:
I was testing the possibilities with CDF, and I think I ran into a bug.
I don't have any delete operations appearing in the results _change_type whether by performing an overwrite or a direct delete on the delta table.
image

What you expected to happen:
I was expecting some delete rows to appear in the CDF.
Reproducing the same operations with delta-spark gives this result:
image

How to reproduce it:
I've made a simple jupyter notebook with examples from the documentation. Here the the python to reproduce:

import pandas as pd
from deltalake import write_deltalake, DeltaTable

table_path = "tmp/delta-table"

df = pd.DataFrame({"num": [1, 2, 3], "letter": ["a", "b", "c"]})
write_deltalake(
    table_path,
    df,
    configuration={
        "delta.minWriterVersion": "7",
        "delta.minReaderVersion": "3",
        "delta.enableChangeDataFeed": "true"
    },
    engine="rust"
)

df = pd.DataFrame({"num": [8, 9], "letter": ["dd", "ee"]})
write_deltalake(table_path, df, mode="append", engine="rust")

df = pd.DataFrame({"num": [11, 22], "letter": ["aa", "bb"]})
write_deltalake(table_path, df, mode="overwrite", engine="rust")

dt = DeltaTable(table_path)
dt.delete(predicate="num = 11")

print(dt.load_cdf(starting_version=0).read_pandas())

More details:
I also tried update operations, and they appear fine in the CDF.

@Sirz3chs Sirz3chs added the bug Something isn't working label Jun 7, 2024
@ion-elgreco ion-elgreco added enhancement New feature or request and removed bug Something isn't working labels Jun 7, 2024
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Jun 7, 2024

@Sirz3chs we currently only have limited support in writing CDF files for the update operation.

Overwrites, predicate overwrites, merge and delete don't write CDF files yet

Fyi @rtyler

@Sirz3chs
Copy link
Author

Sirz3chs commented Jun 7, 2024

Thanks for your quick answer, i spent some time digging into the doc and issues but didn't find the information.

@ion-elgreco
Copy link
Collaborator

In the release it's mentioned that it's added for the update operation, https://github.com/delta-io/delta-rs/releases/tag/python-v0.18.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants