apache · alamb · Jan 17, 2023 · Jan 13, 2023 · Jan 13, 2023 · Jan 13, 2023
diff --git a/README.md b/README.md
@@ -21,34 +21,52 @@
 
 <img src="docs/source/_static/images/DataFusion-Logo-Background-White.svg" width="256" alt="logo"/>
 
-DataFusion is an extensible query planning, optimization, and execution framework, written in
-Rust, that uses [Apache Arrow](https://arrow.apache.org) as its
+DataFusion is very fast, extensible query engine, for building high quality data centric systems in
+[Rust](http://rustlang.org), using the [Apache Arrow](https://arrow.apache.org)
 in-memory format.
 
+DataFusion offers SQL and Dataframe APIs, excellent [performance](https://benchmark.clickhouse.com/), built in support for CSV, Parquet Json, and Avro, extensive customization, and a great community.
+
 [![Coverage Status](https://codecov.io/gh/apache/arrow-datafusion/rust/branch/master/graph/badge.svg)](https://codecov.io/gh/apache/arrow-datafusion?branch=master)
 
 ## Features
 
-- SQL query planner with support for multiple SQL dialects
-- DataFrame API
-- Parquet, CSV, JSON, and Avro file formats are supported natively. Custom
-  file formats can be supported by implementing a `TableProvider` trait.
-- Supports popular object stores, including AWS S3, Azure Blob
-  Storage, and Google Cloud Storage. There are extension points for implementing
-  custom object stores.
+- Feature rich [SQL support](https://arrow.apache.org/datafusion/user-guide/sql/index.html) and [DataFrame API](https://arrow.apache.org/datafusion/user-guide/dataframe.html)
+- Blazingly fast, vectorized, multi-threaded, streaming execution engine.
+- Native support for Parquet, CSV, JSON, and Avro file formats. Support
+  for custom file formats and non file datasources via the `TableProvider` trait.
+- Many extension points: user defined scalar/aggregate/window functions, DataSources, SQL,
+  other query languages, custom plan and execution nodes, optimizer passes, and more.
+- Streaming, asynchronous IO directly from popular object stores, including AWS S3,
+  Azure Blob Storage, and Google Cloud Storage. Other storage systems are supported via the
+  `ObjectStore` trait.
+- [Excellent Documentation](https://docs.rs/datafusion/latest) and a
+  [welcoming community](https://arrow.apache.org/datafusion/community/communication.html).
+- A state of the art query optimizer with projection and filter pushdown, sort aware optimizations,
+  automatic join reordering, expression coercion, and more.
+- Permissive Apache 2.0 License, Apache Software Foundation governance
+- Written in [Rust](https://www.rust-lang.org/), a modern system language with development
+  producticity similar to Java or golang, the performance of C++, and
+  [loved by programmers everywhere](https://insights.stackoverflow.com/survey/2021#technology-most-loved-dreaded-and-wanted).
 
 ## Use Cases
 
-DataFusion is modular in design with many extension points and can be
-used without modification as an embedded query engine and can also provide
-a foundation for building new systems. Here are some example use cases:
+DataFusion can be used without modification as an embedded SQL
+engine or can be customized and used as a foundation for
+building new systems. Here are some examples of systems built using DataFusion:
+
+- Specialized Analytical Database systems such as [CeresDB] and more general spark like system such a [Ballista].
+- New query language engines such as [prql-query] and accelerators such as [VegaFusion]
+- Research platform for new Database Systems, such as [Flock]
+- SQL support to another library, such as [dask sql]
+- Streaming data platforms such as [Synnada]
+- Tools for reading / sorting / transcoding Parquet, CSV, AVRO, and JSON files such as [qv]
+- A faster Spark runtime replacement (blaze-rs)
 
-- DataFusion can be used as a SQL query planner and query optimizer, providing
-  optimized logical plans that can then be mapped to other execution engines.
-- DataFusion is used to create modern, fast and efficient data
-  pipelines, ETL processes, and database systems, which need the
-  performance of Rust and Apache Arrow and want to provide their users
-  the convenience of an SQL interface or a DataFrame API.
+By using DataFusion, the projects are freed to focus on their specific
+features, and avoid reimplementing general (but still necessary)
+features such as an expression representation, standard optimizations,
+execution plans, file format support, etc.
 
 ## Why DataFusion?
 
@@ -57,9 +75,31 @@ a foundation for building new systems. Here are some example use cases:
 - _Easy to Embed_: Allowing extension at almost any point in its design, DataFusion can be tailored for your specific use case
 - _High Quality_: Extensively tested, both by itself and with the rest of the Arrow ecosystem, DataFusion can be used as the foundation for production systems.
 
+## Comparisons with other projects
+
+Here is a comparison with similar projects that may help understand
+when DataFusion might be be suitable and unsuitable for your needs:
+
+- [DuckDB](http://www.duckdb.org) is an open source, in process analytic database.
+  Like DataFusion, it supports very fast execution, both from its custom file format
+  and directly from parquet files. Unlike DataFusion, it is written in C/C++ and it
+  is primarily used directly by users as a serverless database and query system rather
+  than as a library for building such database systems.
+
+- [pola.rs](http://pola.rs): Polars is one of the fastest DataFrame libraries at the time
+  of writing. Like DataFusion, it is also written in Rust but unlike DataFusion
+  it does not provide SQL nor many extension points.
+
+- [Facebook Velox](https://engineering.fb.com/2022/08/31/open-source/velox/)
+  is an execution engine. Like DataFusion, Velox aims to
+  provide a reusable foundation for building database-like systems. Unlike DataFusion,
+  it is written in C/C++ and does not include a SQL frontend or planning /optimization
+  framework.
+
 ## DataFusion Community Extensions
 
-There are a number of community projects that extend DataFusion or provide integrations with other systems.
+There are a number of community projects that extend DataFusion or
+provide integrations with other systems.
 
 ### Language Bindings
 
@@ -78,29 +118,51 @@ There are a number of community projects that extend DataFusion or provide integ
 
 Here are some of the projects known to use DataFusion:
 
-- [Ballista](https://github.com/apache/arrow-ballista) Distributed SQL Query Engine
-- [Blaze](https://github.com/blaze-init/blaze) Spark accelerator with DataFusion at its core
-- [CeresDB](https://github.com/CeresDB/ceresdb) Distributed Time-Series Database
-- [Cloudfuse Buzz](https://github.com/cloudfuse-io/buzz-rust)
-- [CnosDB](https://github.com/cnosdb/cnosdb) Open Source Distributed Time Series Database
-- [Cube Store](https://github.com/cube-js/cube.js/tree/master/rust)
-- [Dask SQL](https://github.com/dask-contrib/dask-sql) Distributed SQL query engine in Python
-- [datafusion-tui](https://github.com/datafusion-contrib/datafusion-tui) Text UI for DataFusion
-- [delta-rs](https://github.com/delta-io/delta-rs) Native Rust implementation of Delta Lake
-- [Flock](https://github.com/flock-lab/flock)
-- [Greptime DB](https://github.com/GreptimeTeam/greptimedb) Open Source & Cloud Native Distributed Time Series Database
-- [InfluxDB IOx](https://github.com/influxdata/influxdb_iox) Time Series Database
-- [Parseable](https://github.com/parseablehq/parseable) Log storage and observability platform
-- [qv](https://github.com/timvw/qv) Quickly view your data
-- [ROAPI](https://github.com/roapi/roapi)
-- [Seafowl](https://github.com/splitgraph/seafowl) CDN-friendly analytical database
-- [Synnada](https://synnada.ai/) Streaming-first framework for data products
-- [Tensorbase](https://github.com/tensorbase/tensorbase)
-- [VegaFusion](https://vegafusion.io/) Server-side acceleration for the [Vega](https://vega.github.io/) visualization grammar
-
-(if you know of another project, please submit a PR to add a link!)
-
-## Example Usage
+- [Ballista] Distributed SQL Query Engine
+- [Blaze] Spark accelerator with DataFusion at its core
+- [CeresDB] Distributed Time-Series Database
+- [Cloudfuse Buzz]
+- [CnosDB] Open Source Distributed Time Series Database
+- [Cube Store]
+- [Dask SQL] Distributed SQL query engine in Python
+- [datafusion-tui] Text UI for DataFusion
+- [delta-rs] Native Rust implementation of Delta Lake
+- [Flock] Cloud database research system
+- [Kamu] Planet-scale streaming data pipeline
+- [Greptime DB] Open Source & Cloud Native Distributed Time Series Database
+- [InfluxDB IOx] Time Series Database
+- [Parseable] Log storage and observability platform
+- [qv] Quickly view your data
+- [prql-query]: Query and transform data with PRQL
+- [ROAPI]: Automatic read-only APIs for static datasets
+- [Seafowl] CDN-friendly analytical database
+- [Synnada] Streaming-first framework for data products
+- [Tensorbase]
+- [VegaFusion] Server-side acceleration for the [Vega](https://vega.github.io/) visualization grammar
+
+[ballista]: https://github.com/apache/arrow-ballista
+[blaze]: https://github.com/blaze-init/blaze
+[ceresdb]: https://github.com/CeresDB/ceresdb
+[cloudfuse buzz]: https://github.com/cloudfuse-io/buzz-rust
+[cnosdb]: https://github.com/cnosdb/cnosdb
+[cube store]: https://github.com/cube-js/cube.js/tree/master/rust
+[dask sql]: https://github.com/dask-contrib/dask-sql
+[datafusion-tui]: https://github.com/datafusion-contrib/datafusion-tui
+[delta-rs]: https://github.com/delta-io/delta-rs
+[flock]: https://github.com/flock-lab/flock
+[kamu]: https://github.com/kamu-data/kamu-cli
+[greptime db]: https://github.com/GreptimeTeam/greptimedb
+[influxdb iox]: https://github.com/influxdata/influxdb_iox
+[parseable]: https://github.com/parseablehq/parseable
+[prql-query]: https://github.com/prql/prql-query
+[qv]: https://github.com/timvw/qv
+[roapi]: https://github.com/roapi/roapi
+[seafowl]: https://github.com/splitgraph/seafowl
+[synnada]: https://synnada.ai/
+[tensorbase]: https://github.com/tensorbase/tensorbase
+[vegafusion]: https://vegafusion.io/ "if you know of another project, please submit a PR to add a link!"
+
+## Examples
 
 Please see the [example usage](https://arrow.apache.org/datafusion/user-guide/example-usage.html) in the user guide and the [datafusion-examples](https://github.com/apache/arrow-datafusion/tree/master/datafusion-examples) crate for more information on how to use DataFusion.
 

diff --git a/docs/source/user-guide/introduction.md b/docs/source/user-guide/introduction.md
@@ -26,7 +26,7 @@ in-memory format.
 DataFusion supports both an SQL and a DataFrame API for building
 logical query plans as well as a query optimizer and execution engine
 capable of parallel execution against partitioned data sources (CSV
-and Parquet) using threads.
+and Parquet) using
 
 ## Use Cases