Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeltaTable is not resilient to corrupted checkpoint state #2258

Closed
echai58 opened this issue Mar 6, 2024 · 0 comments · Fixed by #2270
Closed

DeltaTable is not resilient to corrupted checkpoint state #2258

echai58 opened this issue Mar 6, 2024 · 0 comments · Fixed by #2270
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate bug Something isn't working

Comments

@echai58
Copy link

echai58 commented Mar 6, 2024

Environment

Delta-rs version: 0.15.3

Binding: python


Bug

What happened:
We have a delta table where a job performing a create_checkpoint operation seemed to be killed in the middle of the operation, leading to a corrupted checkpoint state:

  • we see files _delta_log/00000000000000001230.checkpoint.parquet and _delta_log/00000000000000001240.checkpoint.parquet
  • but last_checkpoint says '{"size":1232,"size_in_bytes":773002,"version":1230}'

indicating that the checkpointing process for 1240 was killed before last_checkpoint was updated.

This breaks a call to DeltaTable(), failing the assertion at
https://github.com/delta-io/delta-rs/blob/main/crates/core/src/kernel/snapshot/log_segment.rs#L461, saying left = 2, right = 1.

What you expected to happen:
I'd expect the delta_log to be resilient to being killed in the middle of a checkpoint (should not wholly corrupt the DeltaTable), and the checkpoint process in general should be effectively atomic.

I think this could either be fixed by having the checkpoint process go through a more robust process involving a tmp file of some kind (like how writes work), or something on the read-path should be resilient to this behavior.

@echai58 echai58 added the bug Something isn't working label Mar 6, 2024
@rtyler rtyler added the binding/python Issues for the Python package label Mar 7, 2024
@ion-elgreco ion-elgreco added the binding/rust Issues for the Rust crate label Mar 7, 2024
ion-elgreco added a commit that referenced this issue Mar 9, 2024
…2270)

# Description
We only read the checkpoint files that match the version in
_last_checkpoint now.

# Related Issue(s)
- closes #2258
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants