Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commits on WriteMode::MergeSchema cause table metadata corruption #2468

Closed
emcake opened this issue Apr 30, 2024 · 0 comments
Closed

Commits on WriteMode::MergeSchema cause table metadata corruption #2468

emcake opened this issue Apr 30, 2024 · 0 comments
Labels
binding/rust Issues for the Rust crate bug Something isn't working

Comments

@emcake
Copy link
Contributor

emcake commented Apr 30, 2024

Environment

Delta-rs version: 0.17.0 (not fixed in master)

Binding: rust

Environment: N/A


Bug

What happened: When using WriteMode::MergeSchema on RecordBatchWriter::write_with_mode, I encountered a scenario where the commit had an attached metaData action that a) remoted the partition columns from the metadata, and b) removed those columns entirely from the schema (even though the schemas matched).

What you expected to happen: The schemas to match, and it not to remove the partition column information.

How to reproduce it: write a batch with WriteMode::MergeSchema against a table with partition columns.

More details:

This looks like maybe an oversight in the original schema-merging PR. The code point when dealing with this has a TODO in it for setting the partition columns: https://github.com/delta-io/delta-rs/blob/main/crates/core/src/writer/record_batch.rs#L242

This explains why the partition columns get zeroed on write, the code was just never written with them in mind.

The reason the schema gets updated, is that in the presence of partition columns, self.arrow_schema_ref and self.original_schema_ref will never match. This is because original_schema_ref is the schema of the table, and arrow_schema_ref is the schema of the written parquet file. This second one gets partition columns stripped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants