-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File options are ignored when writing delta #1444
Comments
Right now we expect users to cast their data types to ones Delta Lake supports. We may eventually support automatically casting in the future. That's tracked by #686 |
Gotcha thanks! I was aware of the limitation but the only unsupported data-type I was encountering was this damn timestamp, so I hoped that the file_options would save me the work :) Should I close the issue then? |
Yeah sorry those truncation options don't work for that. I think we'd like to fold this into the general issue for mapping data types though, rather than treat timestamps specially. |
No worries, you guys are doing awesome work, much appreciated |
I am getting the same error, but I did not follow what the fix is, can you please clarify? thanks! |
Environment
Windows 10
Python 3.10.11
Delta-rs version:
deltalake 0.9.0
pyarrow 12.0.0
numpy 1.24.3
Bug
What happened:
I'm receiving json data from a service which is using nanosecond resolution which I need to store in delta format. It's acceptable to have truncated timestamps so I intended to simply allow that and coerce the timestamps to microsecond resolution. However, I end up with this error
PyDeltaTableError: Schema error: Invalid data type for Delta Lake: Timestamp(Nanosecond, Some("UTC"))
What you expected to happen:
I expected the timestamp to be truncated and converted to microseconds.
How to reproduce it:
More details:
This is a minimal producible example from the pipeline I'm creating - receiving a stream of json arrays
The text was updated successfully, but these errors were encountered: