Work Sample for Data Aspect, PySpark Variant

Environment setup

If you already have a functioning Apache Spark configuration, you can use your own. For your convenience, the provided docker-compose.yml is based on the jupyter/pyspark-notebook image. You will need to have Docker and Docker Compose configured on your computer. Check out the Docker Desktop documentation for details.

You can run docker-compose up and follow the prompt to open the Jupyter Notebook UI (looks like http://127.0.0.1:8888/?token=<SOME_TOKEN>).

The given data/ directory mounts as a Docker volume at ~/data/ for easy access:

import os
from pyspark.sql import SparkSession

spark = SparkSession.builder.master('local').getOrCreate()
df = spark.read.options(
    header='True',
    inferSchema='True',
    delimiter=',',
).csv(os.path.expanduser('~/data/DataSample.csv'))

Submission

Please host your solution as one or multiple Notebooks (.ipynb) in a public git remote repository and reply with its link to the email thread you initially received to work on this work sample.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Work Sample for Data Aspect, PySpark Variant

Environment setup

Submission

About

Releases

Packages

EQWorks/ws-data-spark

Folders and files

Latest commit

History

Repository files navigation

Work Sample for Data Aspect, PySpark Variant

Environment setup

Submission

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages