Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic DeltaTable error: Version mismatch with new schema merge functionality in AWS S3 #2262

Closed
liamphmurphy opened this issue Mar 7, 2024 · 2 comments · Fixed by #2396
Labels
bug Something isn't working storage/aws AWS S3 storage related

Comments

@liamphmurphy
Copy link
Contributor

Environment

Delta-rs version: python v0.16

Binding: ^^

Environment:

  • Cloud provider: AWS s3 with dynamo

Bug

What happened:

To test the rust engine, we cleared out any existing delta tables in our nonprod environment and switched from pyarrow over to the rust engine with schema merging, with this write_deltalake call:

 write_deltalake(s3_path, table, schema=pyarrow_schema, mode="append", engine="rust", partition_by=["Uid","date","hour"], schema_mode="merge", configuration={"delta.logRetentionDuration": "interval 7 day"})

Despite it being a brand new Delta table and after some successful writes, eventually the lambdas started erroring with Generic DeltaTable error: Version mismatch. I believe the error is coming from here:

return Err(DeltaTableError::Generic("Version mismatch".to_string()));

What you expected to happen:

Especially since we are testing with a fresh table, I'd expect all writes to work (and not just some) even with the new schema merge flag set.

How to reproduce it:
I was not able to reproduce with a randomly generated dataset locally, so my guess is its something more to do with the dynamo locking on S3 If you have thoughts on how I could test this better, please let me know.

Note that we have roughly 10 concurrent lambdas that could potentially write to Lambda. However, before this change we had 50 writing with pyarrow and all was well.

@liamphmurphy liamphmurphy added the bug Something isn't working label Mar 7, 2024
@ion-elgreco ion-elgreco added the storage/aws AWS S3 storage related label Mar 7, 2024
@rtyler
Copy link
Member

rtyler commented Mar 9, 2024

Does this only manifest with the schema evolution? Or are you able to see errors with append or merge writes as well?

@ion-elgreco
Copy link
Collaborator

Does this only manifest with the schema evolution? Or are you able to see errors with append or merge writes as well?

It happens at any operation when there is concurrency and the state gets updated at the end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working storage/aws AWS S3 storage related
Projects
None yet
3 participants