fix: return unsupported error for merging schemas in the presence of partition columns #2469
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…partition columns
Description
This causes attempts to write to a partitioned table with MergeSchema to fail, as it's not supported by the code.
I took a look at trying to make it work, but there isn't a quick fix. This is because we need a merged schema definition before we start trying to partition by the partition columns, otherwise the newly added columns get dropped. The schema reported for matching in
self.arrow_schema_ref
also needs to contain the partition columns, and ordering matters in comparing schemas so we need to know the right place to insert them. I think for situtations whereflush()
has already been called, we also need a function to return aOption<MetaData>
action to be applied to manual commits. Finally, re-using a writer in the presence of schema evolution is dangerous, as theoriginal_schema_ref
is never updated to match the newly changed schema.I'd love to follow up with a fix, but in the short term I'd like to just stop others get bitten like I did.
Related Issue(s)
#2468