Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): handle PyCapsule interface objects in write_deltalake #2534

Merged
merged 4 commits into from
Jul 18, 2024

Conversation

kylebarron
Copy link
Contributor

@kylebarron kylebarron commented May 21, 2024

Description

Adds support for the Arrow PyCapsule interface.

Since pyarrow is already a required dependency, this takes the minimal route of converting pycapsule interface objects into pyarrow objects. This requires pyarrow 15 or higher for the stream conversion (apache/arrow#39217).

This doesn't modify the existing hard-coded support for pyarrow and pandas

Related Issue(s)

Documentation

@github-actions github-actions bot added the binding/python Issues for the Python package label May 21, 2024
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@ion-elgreco ion-elgreco marked this pull request as draft May 27, 2024 06:53
@ion-elgreco ion-elgreco force-pushed the kyle/write-pycapsule branch from 77d03f0 to 1c805e1 Compare June 4, 2024 10:28
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Jun 4, 2024

@kylebarron can you fix the linting issues? Then we can merge it

Also wondering, how we should typehint this now, since an input can have the c_stream attribute or not

@ion-elgreco ion-elgreco changed the title Handle PyCapsule interface objects in write_deltalake feat(python): handle PyCapsule interface objects in write_deltalake Jun 4, 2024
@kylebarron
Copy link
Contributor Author

@kylebarron can you fix the linting issues? Then we can merge it

I'm pretty packed but I can try to find some time soon.

Also wondering, how we should typehint this now, since an input can have the c_stream attribute or not

You can use these type hints: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html#protocol-typehints

@ion-elgreco
Copy link
Collaborator

@kylebarron ah nice, do you mind adding those typehints when you find the time

@kylebarron kylebarron marked this pull request as ready for review July 17, 2024 23:30
@kylebarron
Copy link
Contributor Author

I believe I fixed the lint and fixed the type hinting.

In the future, a more involved PR could remove pyarrow as a required dependency entirely by passing the C stream pycapsule directly to Rust (arrow-rs has an example of how to do that here)

@ion-elgreco
Copy link
Collaborator

@kylebarron that would be nice, we could potentially make it opt-in since it would then only be needed for reading

@ion-elgreco ion-elgreco force-pushed the kyle/write-pycapsule branch from 387b819 to dfe6e4a Compare July 18, 2024 07:01
Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

@ion-elgreco ion-elgreco enabled auto-merge (squash) July 18, 2024 07:02
@ion-elgreco ion-elgreco merged commit 640ee6e into delta-io:main Jul 18, 2024
21 checks passed
@kylebarron kylebarron deleted the kyle/write-pycapsule branch July 18, 2024 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Arrow PyCapsule interface
2 participants