Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 1.03 KB

README.md

File metadata and controls

22 lines (16 loc) · 1.03 KB

Spark source for Flight enabled endpoints

Build Status

This uses the new Source V2 Interface to connect to Apache Arrow Flight endpoints. It is a prototype of what is possible with Arrow Flight. The prototype has achieved 50x speed up compared to serial jdbc driver and scales with the number of Flight endpoints/spark executors being run in parallel.

It currently supports:

  • Columnar Batch reading
  • Reading in parallel many flight endpoints as Spark partitions
  • filter and project pushdown

It currently lacks:

  • support for all Spark/Arrow data types and filters
  • write interface to use DoPut to write Spark dataframes back to an Arrow Flight endpoint
  • leverage the transactional capabilities of the Spark Source V2 interface
  • publish benchmark test