DeltaTable is not resilient to corrupted checkpoint state #2258
Labels
binding/python
Issues for the Python package
binding/rust
Issues for the Rust crate
bug
Something isn't working
Environment
Delta-rs version: 0.15.3
Binding: python
Bug
What happened:
We have a delta table where a job performing a
create_checkpoint
operation seemed to be killed in the middle of the operation, leading to a corrupted checkpoint state:_delta_log/00000000000000001230.checkpoint.parquet
and_delta_log/00000000000000001240.checkpoint.parquet
last_checkpoint
says'{"size":1232,"size_in_bytes":773002,"version":1230}'
indicating that the checkpointing process for
1240
was killed beforelast_checkpoint
was updated.This breaks a call to
DeltaTable()
, failing the assertion athttps://github.com/delta-io/delta-rs/blob/main/crates/core/src/kernel/snapshot/log_segment.rs#L461, saying
left = 2, right = 1
.What you expected to happen:
I'd expect the delta_log to be resilient to being killed in the middle of a
checkpoint
(should not wholly corrupt the DeltaTable), and thecheckpoint
process in general should be effectively atomic.I think this could either be fixed by having the
checkpoint
process go through a more robust process involving atmp
file of some kind (like how writes work), or something on the read-path should be resilient to this behavior.The text was updated successfully, but these errors were encountered: