Apache Spark Standalone Cluster on Docker

The project was featured on an article at MongoDB official tech blog! 😱

The project just got its own article at Towards Data Science Medium blog! ✨

Introduction

This project gives you an Apache Spark cluster in standalone mode with a JupyterLab interface built on top of Docker. Learn Apache Spark through its Scala, Python (PySpark) and R (SparkR) API by running the Jupyter notebooks with examples on how to read, process and write data.

TL;DR

curl -LO https://raw.githubusercontent.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker/master/docker-compose.yml
docker-compose up

Quick Start

Cluster overview

Application	URL	Description
JupyterLab	localhost:8888	Cluster interface with built-in Jupyter notebooks
Spark Driver	localhost:4040	Spark Driver web ui
Spark Master	localhost:8080	Spark Master node
Spark Worker I	localhost:8081	Spark Worker node with 1 core and 512m of memory (default)
Spark Worker II	localhost:8082	Spark Worker node with 1 core and 512m of memory (default)

Prerequisites

Install Docker and Docker Compose, check infra supported versions

Download from Docker Hub (easier)

Download the docker compose file;

curl -LO https://raw.githubusercontent.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker/master/docker-compose.yml

Edit the docker compose file with your favorite tech stack version, check apps supported versions;
Start the cluster;

docker-compose up

Run Apache Spark code using the provided Jupyter notebooks with Scala, PySpark and SparkR examples;
Stop the cluster by typing ctrl+c on the terminal;
Run step 3 to restart the cluster.

Build from your local machine

Note: Local build is currently only supported on Linux OS distributions.

Download the source code or clone the repository;
Move to the build directory;

cd build

Edit the build.yml file with your favorite tech stack version;
Match those version on the docker compose file;
Build up the images;

chmod +x build.sh ; ./build.sh

Start the cluster;

docker-compose up

Run Apache Spark code using the provided Jupyter notebooks with Scala, PySpark and SparkR examples;
Stop the cluster by typing ctrl+c on the terminal;
Run step 6 to restart the cluster.

Tech Stack

Infra

Component	Version
Docker Engine	1.13.0+
Docker Compose	1.10.0+

Languages and Kernels

Spark	Hadoop	Scala	Scala Kernel	Python	Python Kernel	R	R Kernel
3.x	3.2	2.12.10	0.10.9	3.7.3	7.19.0	3.5.2	1.1.1
2.x	2.7	2.11.12	0.6.0	3.7.3	7.19.0	3.5.2	1.1.1

Apps

Component	Version	Docker Tag
Apache Spark	2.4.0 \| 2.4.4 \| 3.0.0	<spark-version>
JupyterLab	2.1.4 \| 3.0.0	<jupyterlab-version>-spark-<spark-version>

Metrics

Image	Size	Downloads
JupyterLab
Spark Master
Spark Worker

Contributing

We'd love some help. To contribute, please read this file.

Contributors

A list of amazing people that somehow contributed to the project can be found in this file. This project is maintained by:

André Perez - dekoperez - [email protected]

Support

Support us on GitHub by staring this project ⭐

Support us on Patreon. 💖

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
.github		.github
build		build
docs/image		docs/image
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Spark Standalone Cluster on Docker

Introduction

TL;DR

Contents

Quick Start

Cluster overview

Prerequisites

Download from Docker Hub (easier)

Build from your local machine

Tech Stack

Metrics

Contributing

Contributors

Support

About

Releases

Packages

Languages

License

lopezdar222/spark-standalone-cluster-on-docker

Folders and files

Latest commit

History

Repository files navigation

Apache Spark Standalone Cluster on Docker

Introduction

TL;DR

Contents

Quick Start

Cluster overview

Prerequisites

Download from Docker Hub (easier)

Build from your local machine

Tech Stack

Metrics

Contributing

Contributors

Support

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages