FFI for Arrow C Stream Interface #1348
Labels
arrow
Changes to the arrow crate
enhancement
Any new improvement worthy of a entry in the changelog
help wanted
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Enable receiving/sending a stream of Record Batches from/to another Arrow implementation. For example, datafusion-contrib/datafusion-python#21 could benefit from a way to import a RecordBatchReader into Rust so it can be used by DataFusion.
Describe the solution you'd like
It might be worth implementing the Arrow C Stream interface, which allows exporting a stream of record batches. This could enable PyArrow conversion between a PyArrow RecordBatchReader and some structure on the Rust side (an iterator of Record Batches?).
Describe alternatives you've considered
We can use FFI to bring over record batches already. In datafusion-contrib/datafusion-python#21 , I experimented with just wrapping a Python iterator and moving each batch individually, but encountered some issues with deadlocks in the Python GIL.
Additional context
The Arrow C Stream interface was introduced in August 2020, in apache/arrow#8052. It's been used so far to enable sending record batch streams to DuckDB from the R and Python implementation.
The text was updated successfully, but these errors were encountered: