-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cleanup_metadata not respecting custom logRetentionDuration
#2180
Comments
@liamphmurphy spark creates checkpoints by default at 10 commits, we don't do that yet Can you create a checkpoint first before executing clean up metadata? |
@ion-elgreco Hi good call out, here's what I discovered after adding a Using the same |
@liamphmurphy we for some reason use the singular format of days->day. If you want you can open a PR to also parse the plural format:) it's a very trivial change, but haven't found the time yet |
Would you mind pointing to me where this logic is done? I'm trying EDIT: Okay, I figured out that the exact format it's expecting is something like |
^ And in case anyone comes across the same issue, I'm also seeing for right now you need to pass in As a general question @ion-elgreco ; I need to make this change in a production environment. Do you know if its OK to temporarily stop our writers (they're a Python lambda that runs |
@liamphmurphy here you need to change it: https://github.com/delta-io/delta-rs/blob/02fa4c29fe610842cc072efe1154e941dc3613d7/crates/core/src/table/config.rs#L511C4-L511C18, likely change everything to singular | plural => is enough, so @liamphmurphy I think that's fine to do, you could likely also run alter set table_properties in Spark at same time |
Thanks 👍 I also wonder if errors are not being raised on the Python side? Since my string didn't start with |
We don't do any checks on the syntax of the interval but you could do this if you want! :) |
Environment
Delta-rs version: python-v0.15.3
Binding: Python
Environment:
Bug
What happened:
When either creating a table (using
write_deltalake
) or altering an already made delta table using Spark, subsequentcleanup_metadata()
calls are not respecting a custom log retention duration. In fact, if one is provided, it doesn't seem to delete ones past 30 days either (the default).What you expected to happen:
cleanup_metadata
would succeed on files older than the custom log retention value, see blow.How to reproduce it:
First, I'm writing a test Delta Table like so:
Then, to simulate a log file getting old, I modify the first log file like so (making the last modified time for the file roughly a year old):
Then:
And nothing happens, the first log file is not deleted.
More details:
As an aside, if I alter the table using Spark (and do this 10 times, causing 10 transactions logs and presumably hitting the default cleanup metadata on a checkpoint interval logic) using the below, it does correctly delete log files older than 2 days:
The text was updated successfully, but these errors were encountered: