Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update main DataFusion README #4903

Merged
merged 9 commits into from
Jan 17, 2023
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 29 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,16 +86,22 @@ when DataFusion might be be suitable and unsuitable for your needs:
is primarily used directly by users as a serverless database and query system rather
than as a library for building such database systems.

- [pola.rs](http://pola.rs): Polars is one of the fastest DataFrame libraries at the time
of writing. Like DataFusion, it is also written in Rust but unlike DataFusion
it does not provide SQL nor many extension points.
- [pola.rs](http://pola.rs): Polars is one of the fastest DataFrame
alamb marked this conversation as resolved.
Show resolved Hide resolved
libraries at the time of writing. Like DataFusion, it is also
written in Rust and uses the Apache Arrow memory model, but unlike
DataFusion it does not provide SQL nor as many extension points.

- [Facebook Velox](https://engineering.fb.com/2022/08/31/open-source/velox/)
is an execution engine. Like DataFusion, Velox aims to
provide a reusable foundation for building database-like systems. Unlike DataFusion,
it is written in C/C++ and does not include a SQL frontend or planning /optimization
framework.

- [DataBend](https://github.com/datafuselabs/databend) is a complete,
database system. Like DataFusion it is also written in Rust and
utilizes the Apache Arrow memory model, but unlike DataFusion it
targets end-users rather than developers of other database systems.

## DataFusion Community Extensions

There are a number of community projects that extend DataFusion or
Expand All @@ -118,27 +124,26 @@ provide integrations with other systems.

Here are some of the projects known to use DataFusion:

- [Ballista] Distributed SQL Query Engine
- [Blaze] Spark accelerator with DataFusion at its core
- [CeresDB] Distributed Time-Series Database
- [Cloudfuse Buzz]
- [CnosDB] Open Source Distributed Time Series Database
- [Cube Store]
- [Dask SQL] Distributed SQL query engine in Python
- [datafusion-tui] Text UI for DataFusion
- [delta-rs] Native Rust implementation of Delta Lake
- [Flock] Cloud database research system
- [Kamu] Planet-scale streaming data pipeline
- [Greptime DB] Open Source & Cloud Native Distributed Time Series Database
- [InfluxDB IOx] Time Series Database
- [Parseable] Log storage and observability platform
- [qv] Quickly view your data
- [prql-query]: Query and transform data with PRQL
- [ROAPI]: Automatic read-only APIs for static datasets
- [Seafowl] CDN-friendly analytical database
- [Synnada] Streaming-first framework for data products
- [Tensorbase]
- [VegaFusion] Server-side acceleration for the [Vega](https://vega.github.io/) visualization grammar
- [Ballista](https://github.com/apache/arrow-ballista) Distributed SQL Query Engine
- [Blaze](https://github.com/blaze-init/blaze) Spark accelerator with DataFusion at its core
- [CeresDB](https://github.com/CeresDB/ceresdb) Distributed Time-Series Database
- [Cloudfuse Buzz](https://github.com/cloudfuse-io/buzz-rust)
- [CnosDB](https://github.com/cnosdb/cnosdb) Open Source Distributed Time Series Database
- [Cube Store](https://github.com/cube-js/cube.js/tree/master/rust)
- [Dask SQL](https://github.com/dask-contrib/dask-sql) Distributed SQL query engine in Python
- [datafusion-tui](https://github.com/datafusion-contrib/datafusion-tui) Text UI for DataFusion
- [delta-rs](https://github.com/delta-io/delta-rs) Native Rust implementation of Delta Lake
- [Flock](https://github.com/flock-lab/flock)
- [Greptime DB](https://github.com/GreptimeTeam/greptimedb) Open Source & Cloud Native Distributed Time Series Database
- [InfluxDB IOx](https://github.com/influxdata/influxdb_iox) Time Series Database
- [Kamu](https://github.com/kamu-data/kamu-cli/) Planet-scale streaming data pipeline
- [Parseable](https://github.com/parseablehq/parseable) Log storage and observability platform
- [qv](https://github.com/timvw/qv) Quickly view your data
- [ROAPI](https://github.com/roapi/roapi)
- [Seafowl](https://github.com/splitgraph/seafowl) CDN-friendly analytical database
- [Synnada](https://synnada.ai/) Streaming-first framework for data products
- [Tensorbase](https://github.com/tensorbase/tensorbase)
- [VegaFusion](https://vegafusion.io/) Server-side acceleration for the [Vega](https://vega.github.io/) visualization grammar

[ballista]: https://github.com/apache/arrow-ballista
[blaze]: https://github.com/blaze-init/blaze
Expand Down