Skip to content

work sample for data aspect, Apache Spark variant

Notifications You must be signed in to change notification settings

EQWorks/ws-data-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Work Sample for Data Aspect, PySpark Variant

What is this for?

Environment setup

If you already have a functioning Apache Spark configuration, you can use your own. For your convenience, the provided docker-compose.yml is based on the jupyter/pyspark-notebook image. You will need to have Docker and Docker Compose configured on your computer. Check out the Docker Desktop documentation for details.

You can run docker-compose up and follow the prompt to open the Jupyter Notebook UI (looks like http://127.0.0.1:8888/?token=<SOME_TOKEN>).

The given data/ directory mounts as a Docker volume at ~/data/ for easy access:

import os
from pyspark.sql import SparkSession

spark = SparkSession.builder.master('local').getOrCreate()
df = spark.read.options(
    header='True',
    inferSchema='True',
    delimiter=',',
).csv(os.path.expanduser('~/data/DataSample.csv'))

example

Submission

Please host your solution as one or multiple Notebooks (.ipynb) in a public git remote repository and reply with its link to the email thread you initially received to work on this work sample.

About

work sample for data aspect, Apache Spark variant

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published