Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDC support in deltalog when writing delta table #2720

Closed
dsalv opened this issue Aug 1, 2024 · 6 comments
Closed

CDC support in deltalog when writing delta table #2720

dsalv opened this issue Aug 1, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@dsalv
Copy link

dsalv commented Aug 1, 2024

Description

Current delta-rs version is not supporting CDC in deltalog. It says that CDC support will be added in Version 4 here.

What's the current timeline for this feature? How many months or years it can take to add support for CDC?

Use Case
We need to read CDC from deltalog of delta table written using delta-rs.

Related Issue(s)

@dsalv dsalv added the enhancement New feature or request label Aug 1, 2024
@ion-elgreco
Copy link
Collaborator

We already have CDC write support for update operations, @rtyler started looking at MERGE.

@dsalv
Copy link
Author

dsalv commented Aug 1, 2024

We already have CDC write support for update operations, @rtyler started looking at MERGE.

That's awesome! Could you share some links?

@mkp-jansen
Copy link

We already have CDC write support for update operations, @rtyler started looking at MERGE.

Would be awesome to have it for MERGE!

@waddahAldrobi
Copy link

waddahAldrobi commented Aug 6, 2024

@ion-elgreco @rtyler

According to the below, insert-only operations can be efficiently computed from the transaction log.
https://docs.delta.io/latest/delta-change-data-feed.html#change-data-storage

At least that's what Databricks claims that it can do.
https://docs.databricks.com/en/delta/delta-change-data-feed.html#change-data-storage

I tried setting up a table with delta.enableChangeDataFeed = true, but the inserts were no longer registering in the delta table.

I'm not sure if my experiment was wrong, but do you know if this is supported?

Just to clarify, I'm talking about _change_type = insert
image

@ion-elgreco
Copy link
Collaborator

@waddahAldrobi probably something on your side, this code works fine:

from deltalake import DeltaTable
import polars  as pl

df = pl.DataFrame({
    "foo": [1,2], "bar":['1','2']
})
for i  in range(2):
  df.write_delta('test_table', mode='append',  delta_write_options={"configuration":{"delta.enableChangeDataFeed":"true"}, "engine":"rust"})

dt= DeltaTable("test_table")
pl.from_arrow(dt.load_cdf())

shape: (4, 5)
┌─────┬─────┬──────────────┬─────────────────┬─────────────────────────┐
│ foobar_change_type_commit_version_commit_timestamp       │
│ ---------------                     │
│ i64strstri64datetime[ms]            │
╞═════╪═════╪══════════════╪═════════════════╪═════════════════════════╡
│ 11insert12024-08-10 17:42:19.836 │
│ 22insert12024-08-10 17:42:19.836 │
│ 11insert02024-08-10 17:42:15.639 │
│ 22insert02024-08-10 17:42:15.639 │
└─────┴─────┴──────────────┴─────────────────┴─────────────────────────┘

@rtyler rtyler self-assigned this Aug 10, 2024
@rtyler rtyler added this to the Rust v1.0.0 milestone Aug 10, 2024
@waddahAldrobi
Copy link

Thanks @ion-elgreco this is what we needed! 🙏

@rtyler rtyler modified the milestones: Rust v1.0.0, v0.23 Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants