-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: prevent writing checkpoints with a version that does not exist in table state #1863
Conversation
error!( | ||
"create_checkpoint_for called with version {version} but table state contains: {}. The table state may need to be reloaded", | ||
state.version() | ||
); | ||
return Err(CheckpointError::StaleTableVersion(version, state.version()).into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we just checkout the correct version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjones127 how do you mean "checkout?" I am less versed in the nuance of checkpoints. If the last version in the table is 100, is it valid to call this with _last_checkpoint
of version: 5
?
I honestly don't know whether we should even allow a user-specified version here at all, rather than creating a checkpoint with the version of the table state.
HOWEVER, if a writer has a stale table state creates _last_checkpoint
after another writer has created a later checkpoint, this becomes confusing and out of sync anyways 😖
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess to expand, I like the failure option here because the caller is asking us to do something here that will result in incorrect state, and we don't know what they intend or what performance concerns they have. I.e. I don't think we should reload the state here, as an example, because that would be surprising and may actually result in a later version than the user specified too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like (a) failing is the easiest solution that avoids any potential new bug introduction and (b) reloading correct version could have unexpected perf impact, esp for things like streaming workloads that are managing their own table state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh one more thing since my thoughts are disjointed here 😆 I think we already have the path you describe covered with create_checkpoint()
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay re-read the signature. Erroring makes sense to me.
I was thinking in the situation you were describing in the issue report, it sounds like the correct thing to do is to checkout the desired version first then create the checkpoint.
…le state I have seen this in a production environment where the same writer is issuing append transactions using the operations API, which returns the newly created version, such as 10. If the caller then attempts to create a checkpoint for version 10, the operation will produce an inconsistency in the `_last_checkpoint` file, if the callers in-memory table state has *not* been reloaded since the append operation completed. In this scenario the _delta_log/ directory may contain: . ├── 00000000000000000000.json ├── 00000000000000000001.json ├── 00000000000000000002.json ├── 00000000000000000003.json ├── 00000000000000000004.json ├── 00000000000000000005.json ├── 00000000000000000006.json ├── 00000000000000000007.json ├── 00000000000000000008.json ├── 00000000000000000009.json ├── 00000000000000000010.checkpoint.parquet ├── 00000000000000000010.json └── _last_checkpoint While `_last_checkpoint` contains the following: {"num_of_add_files":null,"parts":null,"size":342,"size_in_bytes":95104,"version":9} This will result in an error on any attempts to read the Delta table: >>> from deltalake import DeltaTable >>> dt = DeltaTable('.') [2023-11-14T18:05:59Z DEBUG deltalake_core::protocol] loading checkpoint from _delta_log/_last_checkpoint [2023-11-14T18:05:59Z DEBUG deltalake_core::table] update with latest checkpoint CheckPoint { version: 9, size: 342, parts: None, size_in_bytes: Some(95104), num_of_add_files: None } Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/tyler/venv/lib64/python3.11/site-packages/deltalake/table.py", line 253, in __init__ self._table = RawDeltaTable( ^^^^^^^^^^^^^^ FileNotFoundError: Object at location /home/tyler/corrupted-table/_delta_log/00000000000000000009.checkpoint.parquet not found: No such file or directory (os error 2) >>> To prevent this error condition, the create_checkpoint_for() function should ensure that the provided checkpoint version (used to write the `.checkpoint.parquet` file) matches the table state's version (used to write the `_last_checkpoint` file). This has the added benefit of helping prevent the caller from passing in a nonsensical version number that could also lead to a broken table. Sponsored-by: Scribd Inc
2334f7d
to
6b98ff5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes a lot of sense :).
Currently there also is a draft PR in the works to be able to handle v2 checkpoints and honour the latest recommendations from the protocol.
… oxbow can provide This change incorporates some lessons learned from identifying a production table inconsistency issue where a checkpoint would be written for a version that was newer than the loaded table state. See also delta-io/delta-rs#1863 oxbow is now a bit more conservative and will skip writing checkpoints if a version mismatch between table state and the desired version to write are present
Without reloading the table state the creation of a checkpoint may result in an inconsistent table state with a checkpoint for version X and a _last_checkpoint file for version X-1 being created (a file which does not exist). This was quite pesky to track down, so the lambda also is restructured to more cleanly create a table if one does not exist but return an error for all other errored execution See also delta-io/delta-rs#1863
In some situations where the same writer is issuing append transactions using the operations API, which returns the newly created version, such as 10.
If the caller then attempts to create a checkpoint for version 10, the operation will produce an inconsistency in the
_last_checkpoint
file, if the callers in-memory table state has not been reloaded since the append operation completed.In this scenario the _delta_log/ directory may contain: .
├── 00000000000000000000.json
├── 00000000000000000001.json
├── 00000000000000000002.json
├── 00000000000000000003.json
├── 00000000000000000004.json
├── 00000000000000000005.json
├── 00000000000000000006.json
├── 00000000000000000007.json
├── 00000000000000000008.json
├── 00000000000000000009.json
├── 00000000000000000010.checkpoint.parquet
├── 00000000000000000010.json
└── _last_checkpoint
While
_last_checkpoint
contains the following:{"num_of_add_files":null,"parts":null,"size":342,"size_in_bytes":95104,"version":9}
This will result in an error on any attempts to read the Delta table:
To prevent this error condition, the create_checkpoint_for() function should ensure that the provided checkpoint version (used to write the
.checkpoint.parquet
file) matches the table state's version (used to write the_last_checkpoint
file).This has the added benefit of helping prevent the caller from passing in a nonsensical version number that could also lead to a broken table.