-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: upgrade to Arrow 37 and Datafusion 23 #1314
Conversation
This was discussed with @wjones127 in Slack
This has lots of API changes and will need followup work
This feature makes it much easier to read/write JSON to parquet. This is immediately useful in kafka-delta-ingest but I believe will be much more generally useful for all consumers of the package
I figure we can use this feature for other JSONy things too
ACTION NEEDED delta-rs follows the Conventional Commits The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
Most of what appears to have changed can be solved by a Vec<ArrowField>.into() to convert to the Fields abstraction
This tracks changes in arrow 37 which switched to basically relying on Arc<Field>
pub fn arrow_schema_json(&self) -> PyResult<String> { | ||
let schema = self | ||
._table | ||
.get_schema() | ||
.map_err(PyDeltaTableError::from_raw)?; | ||
serde_json::to_string( | ||
&<ArrowSchema as TryFrom<&deltalake::Schema>>::try_from(schema) | ||
.map_err(PyDeltaTableError::from_arrow)?, | ||
) | ||
.map_err(|_| PyDeltaTableError::new_err("Got invalid table schema")) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this was used in our public API, and I know the arrow-rs folks wanted to remove the unofficial JSON schema serialization from their public API as well. I think it should be fine to drop this.
Lots of fun API changes,
Fields
is the biggest impact in terms of lines of code however.BREAKING CHANGE: new major versions for arrow and datafusion