fix: prevent writing checkpoints with a version that does not exist in table state #1863

rtyler · 2023-11-14T20:30:54Z

In some situations where the same writer is issuing append transactions using the operations API, which returns the newly created version, such as 10.

If the caller then attempts to create a checkpoint for version 10, the operation will produce an inconsistency in the _last_checkpoint file, if the callers in-memory table state has not been reloaded since the append operation completed.

In this scenario the _delta_log/ directory may contain: .
├── 00000000000000000000.json
├── 00000000000000000001.json
├── 00000000000000000002.json
├── 00000000000000000003.json
├── 00000000000000000004.json
├── 00000000000000000005.json
├── 00000000000000000006.json
├── 00000000000000000007.json
├── 00000000000000000008.json
├── 00000000000000000009.json
├── 00000000000000000010.checkpoint.parquet
├── 00000000000000000010.json
└── _last_checkpoint

While _last_checkpoint contains the following:
{"num_of_add_files":null,"parts":null,"size":342,"size_in_bytes":95104,"version":9}

This will result in an error on any attempts to read the Delta table:

>>> from deltalake import DeltaTable
>>> dt = DeltaTable('.')
[2023-11-14T18:05:59Z DEBUG deltalake_core::protocol] loading checkpoint from _delta_log/_last_checkpoint
[2023-11-14T18:05:59Z DEBUG deltalake_core::table] update with latest checkpoint CheckPoint { version: 9, size: 342, parts: None, size_in_bytes: Some(95104), num_of_add_files: None }
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/tyler/venv/lib64/python3.11/site-packages/deltalake/table.py", line 253, in __init__
    self._table = RawDeltaTable(
                ^^^^^^^^^^^^^^
FileNotFoundError: Object at location /home/tyler/corrupted-table/_delta_log/00000000000000000009.checkpoint.parquet not found: No such file or directory (os error 2)
>>>

To prevent this error condition, the create_checkpoint_for() function should ensure that the provided checkpoint version (used to write the .checkpoint.parquet file) matches the table state's version (used to write the _last_checkpoint file).

This has the added benefit of helping prevent the caller from passing in a nonsensical version number that could also lead to a broken table.

wjones127 · 2023-11-14T20:38:49Z

crates/deltalake-core/src/protocol/checkpoints.rs

+        error!(
+            "create_checkpoint_for called with version {version} but table state contains: {}. The table state may need to be reloaded",
+            state.version()
+        );
+        return Err(CheckpointError::StaleTableVersion(version, state.version()).into());


Why can't we just checkout the correct version?

@wjones127 how do you mean "checkout?" I am less versed in the nuance of checkpoints. If the last version in the table is 100, is it valid to call this with _last_checkpoint of version: 5?

I honestly don't know whether we should even allow a user-specified version here at all, rather than creating a checkpoint with the version of the table state.

HOWEVER, if a writer has a stale table state creates _last_checkpoint after another writer has created a later checkpoint, this becomes confusing and out of sync anyways 😖

I guess to expand, I like the failure option here because the caller is asking us to do something here that will result in incorrect state, and we don't know what they intend or what performance concerns they have. I.e. I don't think we should reload the state here, as an example, because that would be surprising and may actually result in a later version than the user specified too

I feel like (a) failing is the easiest solution that avoids any potential new bug introduction and (b) reloading correct version could have unexpected perf impact, esp for things like streaming workloads that are managing their own table state

oh one more thing since my thoughts are disjointed here 😆 I think we already have the path you describe covered with create_checkpoint() 😄

Okay re-read the signature. Erroring makes sense to me.

I was thinking in the situation you were describing in the issue report, it sounds like the correct thing to do is to checkout the desired version first then create the checkpoint.

…le state I have seen this in a production environment where the same writer is issuing append transactions using the operations API, which returns the newly created version, such as 10. If the caller then attempts to create a checkpoint for version 10, the operation will produce an inconsistency in the `_last_checkpoint` file, if the callers in-memory table state has *not* been reloaded since the append operation completed. In this scenario the _delta_log/ directory may contain: . ├── 00000000000000000000.json ├── 00000000000000000001.json ├── 00000000000000000002.json ├── 00000000000000000003.json ├── 00000000000000000004.json ├── 00000000000000000005.json ├── 00000000000000000006.json ├── 00000000000000000007.json ├── 00000000000000000008.json ├── 00000000000000000009.json ├── 00000000000000000010.checkpoint.parquet ├── 00000000000000000010.json └── _last_checkpoint While `_last_checkpoint` contains the following: {"num_of_add_files":null,"parts":null,"size":342,"size_in_bytes":95104,"version":9} This will result in an error on any attempts to read the Delta table: >>> from deltalake import DeltaTable >>> dt = DeltaTable('.') [2023-11-14T18:05:59Z DEBUG deltalake_core::protocol] loading checkpoint from _delta_log/_last_checkpoint [2023-11-14T18:05:59Z DEBUG deltalake_core::table] update with latest checkpoint CheckPoint { version: 9, size: 342, parts: None, size_in_bytes: Some(95104), num_of_add_files: None } Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/tyler/venv/lib64/python3.11/site-packages/deltalake/table.py", line 253, in __init__ self._table = RawDeltaTable( ^^^^^^^^^^^^^^ FileNotFoundError: Object at location /home/tyler/corrupted-table/_delta_log/00000000000000000009.checkpoint.parquet not found: No such file or directory (os error 2) >>> To prevent this error condition, the create_checkpoint_for() function should ensure that the provided checkpoint version (used to write the `.checkpoint.parquet` file) matches the table state's version (used to write the `_last_checkpoint` file). This has the added benefit of helping prevent the caller from passing in a nonsensical version number that could also lead to a broken table. Sponsored-by: Scribd Inc

roeap

makes a lot of sense :).

Currently there also is a draft PR in the works to be able to handle v2 checkpoints and honour the latest recommendations from the protocol.

… oxbow can provide This change incorporates some lessons learned from identifying a production table inconsistency issue where a checkpoint would be written for a version that was newer than the loaded table state. See also delta-io/delta-rs#1863 oxbow is now a bit more conservative and will skip writing checkpoints if a version mismatch between table state and the desired version to write are present

Without reloading the table state the creation of a checkpoint may result in an inconsistent table state with a checkpoint for version X and a _last_checkpoint file for version X-1 being created (a file which does not exist). This was quite pesky to track down, so the lambda also is restructured to more cleanly create a table if one does not exist but return an error for all other errored execution See also delta-io/delta-rs#1863

rtyler added bug Something isn't working binding/rust Issues for the Rust crate labels Nov 14, 2023

github-actions bot added the crate/core label Nov 14, 2023

wjones127 reviewed Nov 14, 2023

View reviewed changes

rtyler force-pushed the checkpoint-table-state branch from 2334f7d to 6b98ff5 Compare November 14, 2023 20:40

rtyler marked this pull request as ready for review November 14, 2023 20:55

rtyler requested a review from roeap as a code owner November 14, 2023 20:55

wjones127 approved these changes Nov 14, 2023

View reviewed changes

roeap approved these changes Nov 14, 2023

View reviewed changes

rtyler merged commit cdf52df into delta-io:main Nov 14, 2023
20 of 21 checks passed

rtyler deleted the checkpoint-table-state branch November 14, 2023 23:32

rtyler added this to the Rust v0.17 milestone Nov 14, 2023

rtyler mentioned this pull request Dec 2, 2023

fix(python): make table version always latest before doing merge #1924

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent writing checkpoints with a version that does not exist in table state #1863

fix: prevent writing checkpoints with a version that does not exist in table state #1863

rtyler commented Nov 14, 2023

wjones127 Nov 14, 2023

rtyler Nov 14, 2023

rtyler Nov 14, 2023

xianwill Nov 14, 2023

rtyler Nov 14, 2023

wjones127 Nov 14, 2023

roeap left a comment

fix: prevent writing checkpoints with a version that does not exist in table state #1863

fix: prevent writing checkpoints with a version that does not exist in table state #1863

Conversation

rtyler commented Nov 14, 2023

wjones127 Nov 14, 2023

Choose a reason for hiding this comment

rtyler Nov 14, 2023

Choose a reason for hiding this comment

rtyler Nov 14, 2023

Choose a reason for hiding this comment

xianwill Nov 14, 2023

Choose a reason for hiding this comment

rtyler Nov 14, 2023

Choose a reason for hiding this comment

wjones127 Nov 14, 2023

Choose a reason for hiding this comment

roeap left a comment

Choose a reason for hiding this comment