Commits on WriteMode::MergeSchema cause table metadata corruption #2468

emcake · 2024-04-30T21:47:01Z

Environment

Delta-rs version: 0.17.0 (not fixed in master)

Binding: rust

Environment: N/A

Bug

What happened: When using WriteMode::MergeSchema on RecordBatchWriter::write_with_mode, I encountered a scenario where the commit had an attached metaData action that a) remoted the partition columns from the metadata, and b) removed those columns entirely from the schema (even though the schemas matched).

What you expected to happen: The schemas to match, and it not to remove the partition column information.

How to reproduce it: write a batch with WriteMode::MergeSchema against a table with partition columns.

More details:

This looks like maybe an oversight in the original schema-merging PR. The code point when dealing with this has a TODO in it for setting the partition columns: https://github.com/delta-io/delta-rs/blob/main/crates/core/src/writer/record_batch.rs#L242

This explains why the partition columns get zeroed on write, the code was just never written with them in mind.

The reason the schema gets updated, is that in the presence of partition columns, self.arrow_schema_ref and self.original_schema_ref will never match. This is because original_schema_ref is the schema of the table, and arrow_schema_ref is the schema of the written parquet file. This second one gets partition columns stripped.

The text was updated successfully, but these errors were encountered:

emcake added the bug Something isn't working label Apr 30, 2024

emcake mentioned this issue Apr 30, 2024

fix: return unsupported error for merging schemas in the presence of partition columns #2469

Merged

rtyler added the binding/rust Issues for the Rust crate label May 29, 2024

ion-elgreco closed this as completed Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commits on WriteMode::MergeSchema cause table metadata corruption #2468

Commits on WriteMode::MergeSchema cause table metadata corruption #2468

emcake commented Apr 30, 2024

Commits on WriteMode::MergeSchema cause table metadata corruption #2468

Commits on WriteMode::MergeSchema cause table metadata corruption #2468

Comments

emcake commented Apr 30, 2024

Environment

Bug