Contributing to Ballista

First steps

Please read my article How to build a modern distributed compute platform since it is a good introduction to how I think Ballista (and other distributed compute platforms) should work. This article is a work in progress that I update from time to time, as I learn more about this subject, or when I feel motivated to write.

There is also a wiki with a list of interesting reading material.

This project depends on some existing technologies, so it is a good idea to learn a little about those too:

Apache Arrow
DataFusion
Kubernetes
gRPC

Ballista will extend DataFusion to support distributed query execution of DataFusion queries by providing the following components:

Serde code to support serialization and deserialization of logical and physical query plans in protocol buffer format (so that full or partial query plans can be sent between processes).
Executor process implementing the Flight protocol that can receive a query plan and execute it.
Shuffle write operator that can store the partitioned output of a query in the executor’s memory, or persist to the file system.
Shuffle read operator than can read shuffle partitions from other executors in the cluster.
Distributed query planner / scheduler that will start with a DataFusion physical plan and insert shuffle read and write operators as necessary and then execute the query stages.
Kubernetes/Etcd support so that clusters can easily be created.

Introduce Yourself!

We have a Gitter IM room for discussing this project as well as a discord channel.

Issues

See the current milestones and issues here. I recommend starting here when contributing because there is a plan in place for delivering useful point solutions along the way as the project heads towards a v1.0 release.

Creating Pull Requests

This project uses the standard GitHub Forking Workflow.

Development Environment

See the developer docs for instructions on setting up a local build environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CONTRIBUTING.md

CONTRIBUTING.md

Contributing to Ballista

First steps

Introduce Yourself!

Issues

Creating Pull Requests

Development Environment

Files

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to Ballista

First steps

Introduce Yourself!

Issues

Creating Pull Requests

Development Environment