Skip to content

Deploy JupyterHub to your Kubernetes cluster pre-loaded with fast.ai's Practical Deep Learning for Coders course notebooks and all required dependencies for an all-in-one "It Just Works™" deployment.

License

Notifications You must be signed in to change notification settings

TeoZosa/jupyterhub-fastbook

Repository files navigation

JupyterHub-Fastbook

Pre-loaded with Jupyter and all the required dependencies (installed in a conda environment) for an all-in-one automated, repeatable deployment without any setup.

For those that lead a team, scale out by deploying the environment to multiple users at once via JupyterHub, hosted on your own Kubernetes cluster.

This is a standalone deployment which can be extended or used as-is for your own multi-user Jupyter workflows.

*See the Further Reading section for more details on the above mentioned technologies.


Table of Contents

Quickstart

# Note: the `latest` tag is used here for expediency. When possible, you should
# pin your version by specifying an exact Docker image tag,
# e.g., `TAG=v20201007-7890c25`
TAG=latest
docker run -p 8888:8888 teozosa/jupyterhub-fastbook:${TAG}
Note: This will automatically pull the image from Docker Hub if it is not already present on your machine; it is fairly large (~5 GB), so this may take awhile.

Follow the directions on-screen to log in to your local Jupyter notebook environment! 🎉

Note: the first URL may not work. If that happens, try the URL beginning with http://127.0.0.1

Important: When running the fast.ai notebooks, be sure to switch the notebook kernel to the fastbook environment

Deploying JupyterHub to Your Kubernetes Cluster

Please see the unabridged Kubernetes deployment section for an in-depth explanations of the below steps

From the root of your repository, on the command line, run:

# Generate and store secret token for later usage
echo "export PROXY_SECRET=$(openssl rand -hex 32)" > .env

# Install Helm
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
# Verify Helm
helm list
# Add JupyterHub Helm charts
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update

# Deploy the JupyterHub service mesh onto your Kubernetes cluster
# using the secret token you generated in step 1.
# Note: the `latest` tag is used here for expediency. When possible, you should
# pin your version by specifying an exact Docker image tag,
# e.g., `TAG=v20201007-7890c25`
make deploy TAG=latest

You should then be greeted by a Helm messages similar to the below

Check that all the pods are running

kubectl --namespace jhub get all

Get the JupyterHub server address

JUPYTERHUB_IP=$(kubectl --namespace jhub get service proxy-public -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo $JUPYTERHUB_IP

Type the IP from the previous step into your browser, login*, and you should now be in the JupyterLab UI! 🎉

* JupyterHub is running with a default dummy authenticator so entering any username and password combination will let you enter the hub.

Important: When running the fast.ai notebooks, be sure to switch the notebook kernel to the fastbook environment



Overview

Benefits

  1. Immediately get started on the fast.ai Practical Deep Learning for Coders course without any extra setup via the JupyterHub-Fastbook Docker image[1]
  2. Deploy JupyterHub (with the JupyterHub-Fastbook Docker image):
  3. Roll your own JupyterHub deployment:
    • Use the deployment as-is; you get a fully-featured JupyterHub deployment that just so happens to have fast.ai's Practical Deep Learning for Coders course dependencies pre-loaded.
    • Extend the configuration and deployment system in this project for your particular needs.
    • Build and push your own JupyterHub-Fastbook images to your own Docker registry.

[0] Tested with Microk8s on Ubuntu 18.04.4.

[1] Based on the official jupyter/minimal-notebook from Jupyter Docker Stacks. This means you get the same features of a default JupyterHub deployment with the added functionality of an isolated fastbook conda environment.

Example Uses

Use JupyterHub-Fastbook in conjunction with the fast.ai Practical Deep Learning for Coders course:

  1. To go through the course on your own with virtually no setup by running the JupyterHub-Fastbook Docker image locally.
  2. As the basis for a study group
  3. To onboard new junior members of your organization's AI/ML team

Or anything else you can think of!

Why This Project?

The purpose of this project was to reduce any initial technical barriers to entry for the fast.ai Practical Deep Learning for Coders course by automating the setup, configuration, and maintenance of a compatible programming environment, scaling that experience to both individuals and groups of individuals.

In the same spirit as the course, if you don't need a PhD to build AI applications, you also shouldn't need to be a DevOps expert to get started with the course.

We've done all the work for you. All you need to do is dive in and get started!

Technical Notes

  1. When running the Docker image as a container in single-user mode, outside of Kubernetes, you will interact directly with the Jupyter Notebook interface (see: Quickstart: Running the Docker image locally).

  2. The JupyterHub Kubernetes deployment portion of this project is based on the official Zero to JupyterHub with Kubernetes guide and assumes you have your own Kubernetes cluster already set up. If not and you are just starting out, Minikube is great for local development and Microk8s works well for single-node clusters.

Advanced Usage

Makefile Overview

Available rules
build               Build Docker container
config.yaml         Generate JupyterHub Helm chart configuration file
deploy              Deploy JupyterHub to your Kubernetes cluster
push                Push image to Docker Hub container registry

Tip: invoking make without any arguments will display auto-generated documentation similar to the above.

Build and Push Your Own Docker Image

In addition to deployment, the makefile contains facilities to build and push Docker images to your own repository. Simply edit the appropriate fields in Makefile and invoke make with one of: build, push.

Enabling GitHub Oauth[2]

Determine your JupyterHub host address (the address you use in your browser to access JupyterHub) and add it to your .env file

JUPYTERHUB_IP=$(kubectl --namespace jhub get service proxy-public -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo "export JUPYTERHUB_IP=${JUPYTERHUB_IP}" >> .env

Generate your GitHub Oauth credentials and add them to your .env file

Follow this tutorial: GitHub documentation: Building OAuth Apps - Creating an OAuth App, then:

GITHUB_CLIENT_ID=$YOUR_GITHUB_CLIENT_ID
GITHUB_CLIENT_SECRET=$YOUR_GITHUB_CLIENT_SECRET
echo "export GITHUB_CLIENT_ID=${GITHUB_CLIENT_ID}" >> .env
echo "export GITHUB_CLIENT_SECRET=${GITHUB_CLIENT_SECRET}" >> .env

Redeploy your JupyterHub instance

# Note: the `latest` tag is used here for expediency. When possible, you should
# pin your version by specifying an exact Docker image tag,
# e.g., `TAG=v20201007-7890c25`
make deploy TAG=latest

Now, the first time a user logs in to your JupyterHub instance, they will be greeted by a screen that looks like this:

Once they click "Authorize", users will now automatically be authenticated via GitHub's Oauth whenever they log in.

[2] see: JupyterHub documentation: Authenticating with OAuth2 - GitHub


Setup

source: JupyterHub documentation: Setting up JupyterHub

Note: commands in this section should be run on the command line from the root of your repository.

Generate a secret token for your JupyterHub deployment and place it in your local .env file

echo "export PROXY_SECRET=$(openssl rand -hex 32)" > .env
DANGER! DO NOT VERSION CONTROL THIS FILE!

If you need to store these values in version control, consider using something like SOPS.

Install Helm

source: JupyterHub documentation: Setting up Helm

  • Download and install
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
  • Verify installation and add JupyterHub Helm charts:
# Verify Helm
helm list
# Add JupyterHub Helm charts
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update

Deployment

source: JupyterHub documentation: Setting up JupyterHub

Generate a JupyterHub configuration file*

make config.yaml

This will create a config.yaml by populating fields of config.TEMPLATE.yaml with the pre-set deployment variables and values specified in your .env file.

* Anything generated here will be overwritten by the following deployment step with the most recent values, but this step is here for completion's sake.

Deploy JupyterHub to your Kubernetes cluster

Once you've verified config.yaml contains the correct information, on the command line, run:

# Note: the `latest` tag is used here for expediency. When possible, you should
# pin your version by specifying an exact Docker image tag,
# e.g., `TAG=v20201007-7890c25`
make deploy TAG=latest

This will deploy the JupyterHub instance to your cluster via the official Helm chart, parametrized by pre-set deployment variables and the config.yaml file you generated in the previous step.

to override a pre-set deployment variable, simply edit the appropriate value in Makefile.

A note on built-in image tag logic

The makefile defaults to strong versioning of image tags (derived from Google's Kubeflow Central Dashboard Makefile) for unambiguous container image provenance.

Unless you are pushing and pulling to your own registry, you MUST override the generated tag with your desired tag when deploying to your own cluster.


fast.ai: A non-profit research group focused on deep learning and artificial intelligence.

  • fastai: The free, open-source software library from fast.ai that simplifies training fast and accurate neural nets using modern best practices.

                

  • Practical Deep Learning for Coders: the creators of fastai show you how to train a model on a wide range of tasks using fastai and PyTorch. You’ll also dive progressively further into deep learning theory to gain a complete understanding of the algorithms behind the scenes.



                

Jupyter Notebook: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.



JupyterHub: A multi-user version of the notebook designed for companies, classrooms and research labs



Anaconda (conda for short): A free and open-source distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment.



Docker: A set of platform-as-a-service products that use OS-level virtualization to deliver software in packages called containers.

  • A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.



Kubernetes: An open-source system for automating deployment, scaling, and management of containerized applications.

Disclaimer

Neither I nor my employer are affiliated in any way with fast.ai, Project Jupyter, or any other organizations responsible for any of the technologies used in this project.

About

Deploy JupyterHub to your Kubernetes cluster pre-loaded with fast.ai's Practical Deep Learning for Coders course notebooks and all required dependencies for an all-in-one "It Just Works™" deployment.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published