-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update FAQ to say the it is impossible to move a deltalake in S3 #1293
Comments
I am unclear as to what this issue is regarding. Are you asking to add a warning to the FAQ here https://docs.delta.io/latest/delta-faq.html#can-i-copy-my-delta-lake-table-to-another-location that it is not possible to do so on S3?
Or is this a feature request that we add a way for time travel by timestamp to be supported for these copied tables (when timestamps are not preserved)? You are still able to time travel by version. |
I realize this doesn't apply in your HDFS to S3 case @ABRB554, but as far as the suggested change in the FAQ, I think it is possible to retain the original system metadata when moving objects within S3 with replication https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication.html#replication-scenario |
Expanding a bit on this issue. My use case is the same as @ABRB554. Consider the following source code: It appears the current delta functionality is: (I'm no expert so someone please double check my understanding. I've got no idea at this point how file modification date works for compacted files, which might change below discussion....) Regarding proposed changes to this functionality, there are the following concerns I gather from this issue and previous linked issue:
I believe this concern is misplaced, because as seen from the code the JSON file is anyway being read into the
I believe this concern can be avoided by making any proposed change an optional change, toggle it via configuration and keep default behavior as default.
Apart from S3 to S3 replication, there is no way to do this. S3 to S3 replication would replicate timestamp of an existing S3 object, which itself cannot be modified. This means for any user moving a table from any storage to S3, and needing time travel to work consistently with a timestamp, there will be a problem. Presumably, the problem is actually wider than this because any other (not just S3) storage would have at minimal a requirement for the user to update file modification dates per transaction log file, which requires custom scripting outside of the delta lake table support. Ideally, delta-lake table should have sufficient integrity that all metadata required of it to function is independent of the file system implementation.
If we agree that using a commit timestamp instead of a file modification date, should be an optional / opt-in feature, then I would suggest that the documentation regarding enabling this feature could inform the user of this limitation and that should be sufficient. In many cases, the client clock is under some kind of organisation-wide control and a single-client is updating the table each day. So this is somewhat of an edge case concern.
A scenario that comes to mind is: Commit is started at t0, data is written until t1 time, and then delta actually commits the transaction at t2. In this case, t0 (commit timestamp) and t2 (file modification date) may be significantly different. However a counter argument could be - what is then the value / meaning of the commit timestamp? In traditional database we might say a transaction starts at time t0 , changes are made but only committed at time t2. It is misleading to think of delta commit timestamp in the transaction log as the physical commit timestamp. Proposal Considering these concerns, below is a proposed way to address all listed concerns. 1 - Make sure below changes in behavior are an optional configuration, off by default. 2 - On compaction of transaction log files, record the "real" / effective commit timestamp against the commit. Meaning, at the time of compaction get the log file modification date and write it into the commit info in the compaction file. Possibly, use a new field added to 3 - Only override the commit timestamp with the file system timestamp on the The above approach might improve performance since only non-compacted files would need to check the file system modify date, compared to current implementation which would always check. Limitations
This means any user-specific tools that move a delta table from some storage to another, would need to either first force a compaction (difficult as these API are not exposed) or, more simply, modify the JSON transaction files to add the I can't think of other limitations.... Please @allisonport-db / @zsxwing would you feedback on the proposed approach? "*" If no |
The FAQ has been updated with: Remember to copy files without changing the timestamps to ensure that the time travel with timestamps will be consistent.
This needs to include a warning that when the underlying storage is S3 it is impossible to move/create an object with the original/custom timestamp.
Or am I missing something? We are looking to move lots of data from HDFS to S3 but there is no way to preserve timestamps in this process so we cannot move out HDFS deltalakes to S3. What is more important to the business users of the data, a slight performance hit or a complete loss of timetravel?
Original post:
timestamp
in thecommitInfo
is created before we create the json file. UsingcommitInfo.timestamp
will maketimestamp
s not in the same order asversion
s easily. In addition, it's the timestamp in the client side and which clock skew/incorrect clock time is easier to happen. Moreover, if we need to read the content of json files when trying to look for which version by the timestamp, we would need to open tons of json files. Currently, we just need to use the file listing result which is much faster.Since we have updated the doc for this issue: https://docs.delta.io/latest/delta-faq.html#can-i-copy-my-delta-lake-table-to-another-location , I'm going to close this.
Originally posted by @zsxwing in #192 (comment)
The text was updated successfully, but these errors were encountered: