-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timestamps of versions change when files are copied #192
Comments
We use timestamps of JSON files in You can check your file system to see if it provides some command to copy files without changing the timestamps, or write codes to update modification timestamps of the JSON files in |
I noticed this when I tried to use timestamps for time travelling. I got an error message about the timestamps and was really confused because I expected intuitively that the timestamps written to the json files would be used. I had to look at the actual code to find out that the modification timestamp is used instead.
I think I have to look into modifying the modification timestamp of the json files programmatically because I have to copy the files beyond system boundaries and I think it is not possible to preserve the timestamps. |
I was investigating the time travel functionality and picked up this issue. We have a need to query our delta lake to see how the data looked at a point in time. The JSON file contains a timestamp of the commit: {"commitInfo":{"timestamp":1579786725976, Why not use this rather than the modified time of the file? If a person makes a mess up replicating the data and the modified timestamps change this won't affect the delta lake. |
I guess this is for performance reasons to avoid having to open the file itself, but I agree that it is risky to rely on this metadata. Is this behavior mentioned in the documentation or the protocol? I couldn't find any clear statement about it. |
Great call out @rdettai - we will update the Delta Documentation FAQ to include a call out for this scenario and as you surmised, using the file timestamp allows for faster retrieval. Will keep this issue open until the documentation PR has been merged and published. Thanks! |
Since we have updated the doc for this issue: https://docs.delta.io/latest/delta-faq.html#can-i-copy-my-delta-lake-table-to-another-location , I'm going to close this. |
I have a (tiny) delta table containing multiple versions. When I copy the delta table files to another location, the timestamps in the history are wrong. As far as I understand now, the timestamp in the delta log json files is overwritten by the modification timestamp of those files.
I want to use the copy of the table as input in my tests that test the data loading code. Is there a way to preserve the timestamps (i.e. using the timestamps within the json) or what would be the best approach for such tests?
The text was updated successfully, but these errors were encountered: