-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust,python): cast each parquet file to delta schema #2615
Conversation
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
@HawaiianSpork can you add a test where we have a delta table that contains parquets with nanosecond timestamps in the files. Maybe just create a parquet table and then use convert to delta? |
@ion-elgreco, I'd be happy to add more tests but want to make sure I create the correct ones. FYI, @wjones127 and @roeap for making the original commit that read the schema from the parquet files: #1266. |
This looks promising, but I would like to update the title if you don't mind for the changelog in the future. Schema evolution is typically understood in the Delta context as changes to the Delta schema (i.e. a transaction commit occurs). I am understanding this correctly it's more about schema adaptation on read results |
d566b9d
to
27b9639
Compare
By casting the read record batch to the delta schema datafusion can read tables where the underlying parquet files can be cast to the desired schema.
27b9639
to
d48425e
Compare
Conflicts: crates/core/src/delta_datafusion/mod.rs
@HawaiianSpork can you rebase so that we can merge? |
@ion-elgreco I have merged with master as requested. |
Description
By casting the read record batch to the delta schema datafusion can read tables where the underlying parquet files can be cast to the desired schema. Fixes:
This can be done now since data-fusion exposes a SchemaAdapter which can be overwritten.
We should note that this makes all times being read by delta-rs as having microsecond precision to match the Delta protocol.
Related Issue(s)