Sarah Gibson, The Alan Turing Institute
The Turing Way - making reproducible data science too easy not to do!
Link to this page: bit.ly/zero-to-binderhub-workshop
The following workshop will walk you through deploying a BinderHub onto a Microsoft Azure Kubernetes Service. It will be a publicly available service like mybinder.org.
Throughout the workshop, you will see some emojis. The π¦ emoji indicates a TODO step and you should use a green or red post-it to indicate to the instructor if you've completed the TODO (green) or are having problems (red). The β emoji indicates a point where the instructor will pause to take questions. They are placed either at the end of a section to allow everyone to move on together, or after a step that may need more explanation.
You will need the following resources to be able to participate in this workshop:
- A Docker Hub account - sign up here: https://hub.docker.com/signup
And either:
- A Microsoft Azure Free Trial subscription - sign up here: https://azure.microsoft.com/en-gb/free/
Or:
- A Microsoft Azure Pass - You will have been given instructions on how to acquire one of these by the instructor before the workshop.
Important: If possible, please do not create an Azure account with a ".ac.uk" domain or an address affiliated with an organisation. This may cause issues during deployment.
To complete the extra curricular steps, you will also need a GitHub account.
- General Information
- Computational Requirements
- Building a BinderHub
- Extra Curricular Steps
- Tearing Down your BinderHub Deployment
- Example config files
- Glossary of Kubernetes terms
- Reference Documentation
BinderHub is a Cloud-based, open source technology that can host multiple instances of a Git repository and its computing environment. This allows code to be instantly runnable and reproducible by anyone anywhere in the world at the click of a link. The public Binder service is hosted at mybinder.org.
Since Binder and BinderHub are open source projects maintained by volunteers, they ask that users of mybinder.org stay within certain limitations in order to keep running costs as low as possible while still providing a usable service.
By deploying your own BinderHub, you can provide a service to your users that is much more customised to their needs. The most desirable benefit of this is allowing your users access to private repositories and handling sensitive data. But customisations could also include authentication, greater computational resources per user, bespoke package stacks or persistent user storage.
The Binder team's overview of the BinderHub architecture can be found here.
-
A Kubernetes cluster - This is an automated system for deploying, scaling and maintaining containerised applications. Kubernetes can be used on any kind of server, Cloud-based or otherwise.
-
The BinderHub Helm Chart - Helm is a package manager for deploying, maintaining and upgrading applications on a Kubernetes service. It is a collection of install instructions for a set of Kubernetes resources. The BinderHub Helm Chart contains a set of tools required by the Binder service.
-
repo2docker
- This is a tool that builds a Docker container out of a Git repository based on a configuration file which describes the software dependencies. -
JupyterHub - This is a tool for serving Jupyter Notebooks for multiple users and automatically spawns, manages and proxies the server instances.
-
The Binder Load Balancer - This coordinates the launches of Binder instances and parses various jobs to
repo2docker
and the JupyterHub.
-
-
A container registry, we will use DockerHub - A container registry is a storage and management server (usually Cloud-based) for containerised environments. The BinderHub will need to push (or "save") containers it has built to a registry, and pull (or "download") containers in order to run them from the JupyterHub.
This workshop assumes you have an account with Microsoft Azure.
Either a Free Trial subscription: It's quick to set one up and you get Β£150 free credit for the first 30 days as well as access to some always free services. You will be asked to provide a credit card. This is only for identity verification, you will not be charged. When your free trial expires, your resources will automatically be frozen and then deleted after a month if you don't reactivate your subscription.
Or an Azure Pass: You will have been given instructions on how to acquire one of these by the instructor before the workshop.
NOTE: Please do not sign up with a ".ac.uk" domain or organisation affiliated address as you may encounter some issues with Service Principal permissions when we deploy the Kubernetes cluster.
BinderHub is Cloud-neutral. We are using Azure as an example.
These instructions will link the BinderHub to a DockerHub Container Registry, and so you will need a DockerHub account as well. This will be where BinderHub stores the images it builds.
BinderHub also works with Google Container Registry, Azure Container Registry and custom registries. We are using DockerHub as an example.
Building a BinderHub requires a few pieces of sensitive information, such as access tokens and passwords. In this workshop, we will be saving this information to disk which is not ideal.
The ideal scenario would be to store this information in an Azure Key Vault such that the secrets could be programmatically added to the relevant files and then locally deleted as necessary. However, this falls out of the scope of a BinderHub workshop.
You can access Key Vault Quickstarts and Tutorials here.
This workshop will use the Cloud Shell in Azure Portal as it already has all of the tools we need preinstalled.
- Login to the Portal here: https://portal.azure.com/
- Open the Cloud Shell from the top of the dashboard:
The first time you open the Cloud Shell, you will be asked to create some persistent storage. This allows you to save and retrieve scripts, templates and configuration files. Accept any prompts you see.
π¦ π¦ π¦ π¦ π¦
We're going to download some template YAML files and a shell script that will automatically populate them with information using sed
.
This will make the BinderHub setup less intensive.
Make a folder to store these files and change into it.
mkdir testhub
cd testhub
π¦ π¦ π¦ π¦ π¦
Run the following three commands to download the setup.sh
script and the config-template.yaml
and secret-template.yaml
files.
wget -O setup.sh http://bit.ly/config-setup-script
wget -O config-template.yaml http://bit.ly/config-template
wget -O secret-template.yaml http://bit.ly/secret-template
π¦ π¦ π¦ π¦ π¦
Make the shell script executable with the following command.
chmod 700 setup.sh
π¦ π¦ π¦ π¦ π¦
Now make a secrets folder inside testhub
where we will save secrets.
mkdir secrets
π¦ π¦ π¦ π¦ π¦
NOTE: If you are using version control, it is strongly recommended that you add the secrets
folder to a .gitignore
file to ensure secret information is not made public.
Do this before adding any secrets to the folder!
Adapted from Step Zero: Kubernetes on Microsoft Azure Container Service (AKS).
A short (but by no means exhaustive) glossary of Kubernetes terms is given at the end of this workshop, should you require further explanation.
Run the following command:
az login
You should then see a message reading:
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code A-RANDOM-CODE to authenticate.
Visit the webpage https://microsoft.com/devicelogin and enter the random code as it appears in your cloud shell. Then select the account that you would like to sign in with.
To see a list of Azure subscriptions you have available to you, you can run the following command.
az account list --refresh --output table
This prints your subscriptions to the terminal in a human-readable format. Now let's set our working subscription.
If you are using a Free Trial suscription:
az account set --subscription "Free Trial"
If you are using an Azure Pass:
az account set --subscription "Azure Pass - Sponsorship"
π¦ π¦ π¦ π¦ π¦
NOTE: If your subscription name has no whitespace, the quotation marks are not required.
Resource Groups are how the Azure environment labels services that are related to each other (further details in this blog post). We will create a resource group in a specific region and create computational resources within this group.
az group create --name testhub \
--location westeurope \
--output table
--name
specifies the name of your resource group and should be something that uniquely identifies this hub from other hubs you may deploy.--location
specifies the region of the data centres where your resource will exist. A list of data centre regions and locations can be found here. We have chosen West Europe for resource availability.--output table
specifies the output should be in human-readable ASCII table format as opposed to JSON, which is the default output.
π¦ π¦ π¦ π¦ π¦
This command will request a Kubernetes cluster within the resource group we created.
It will request a Kubernetes cluster with one Standard_D2s_v3
virtual machine as a 'node'.
Information on other types of virtual machines is available.
NOTES:
--name
(hubcluster
in this example) cannot exceed 63 characters and can only contain letters, numbers, or hyphens (-
).- If you are not using a Free Trial subscription, try setting
--node-count
to 3 instead.
az aks create --name hubcluster \
--resource-group testhub \
--no-ssh-key \
--node-count 1 \
--node-vm-size Standard_D2s_v3 \
--output table
--name
is the name of the cluster.--resource-group
is the resource group we created in Step 3: Create a Resource Group.--node-count
is the number of desired nodes in the Kubernetes cluster.--node-vm-size
is the size of the nodes you wish to use, which varies based on the use-case of the cluster and how much RAM/CPU each user will need.
NOTE: The default version of Kubernetes will be installed, you can use the --kubernetes-version
flag to install a different version.
This step may take some time to execute!
π¦ π¦ π¦ π¦ π¦
WARNING: If this step fails due to insufficient permissions on the directory, this is likely because you're using a ".ac.uk" email to login to Azure and your institution has limited your ability to create Service Principals. For the sake of this workshop, you should have created your account with a different email address. Otherwise, you could ask your IT Services to provide you with a Service Principal.
If your Cloud Shell has timed out while the cluster was creating, you can do the following steps to retrieve the logs and see if the deployment was successful.
In Azure Portal, select "Resource Groups" from the left-most menu panel.
From the list of resource groups, select testhub
.
From the left-side menu, select "Activity log".
The logs for deploying the cluster will be under "Managed Cluster". The second column contains the status of the deployment. This will say "Succeeded" if everything went well. If not, opening the logs should provide more detail as to what went wrong.
Once this command has completed, some extra resource groups will have been created which is normal behaviour.
You can inspect them in the Azure Portal.
The testhub
group will contain the Kubernetes service, whereas a new resource group called MC_testhub_hubcluster_westeurope
containing the cluster resources (virtual machines, etc.) will have appeared.
There will also be a NetworkWatcherRG
group which will be empty.
This group is created under the assumption that the Kubernetes service will be extended in the future, which unlikely to be the case when deploying BinderHub.
This group can be deleted.
This step automatically updates your local Kubernetes client configuration file to be configured with the remote cluster we've just deployed, and allowing kubectl
to be "logged-in" to the cluster.
az aks get-credentials --name hubcluster --resource-group testhub
--name
is the cluster name defined in Step 4: Create an Azure Container Service (AKS) Cluster.--resource-group
is the resource group created in Step 3: Create a Resource Group.
kubectl get nodes
The output of this command should list one node (unless you changed --node-count
in Step 4: Create an Azure Container Service (AKS) Cluster) with a STATUS
of READY
.
The VERSION
field reports which version of Kubernetes is installed.
Example output:
NAME STATUS ROLES AGE VERSION
aks-nodepool1-97000712-0 Ready agent 19m v1.9.11
π¦ π¦ π¦ π¦ π¦
β β β β β
Adapted from Zero-to-JupyterHub: Setting up Helm.
Helm is the package manager for Kubernetes and is used for installing, upgrading and managing applications on a Kubernetes cluster. Helm packages are called charts. Helm manages releases (installations) and revisions (versions) of charts deployed on the cluster.
Did you know?: Kubernetes is Greek for "captain" or "helmsman". In case you haven't noticed the nautical theme!
To verify you have access to the correct version, run the following command.
helm version --short
Example output:
v3.3.4+ga61ce56
Then check it has been installed properly by running the following command.
helm list
Example output:
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
π¦ π¦ π¦ π¦ π¦
β β β β β
Adapted from Zero-to-BinderHub: Setup BinderHub.
Before we install a BinderHub, we need to configure several pieces of information and save them in YAML files.
Create two random tokens:
openssl rand -hex 32 > secrets/apiToken.txt
openssl rand -hex 32 > secrets/secretToken.txt
π¦ π¦ π¦ π¦ π¦
Now run setup.sh
.
This will populate secret-template.yaml
and config-template.yaml
with the appropriate information and save them to the secrets
folder.
You will be asked to provide your DockerHub login credentials in order to connect your DockerHub account to the BinderHub. You must provide your Docker username, not your email address associated with the account.
./setup.sh
π¦ π¦ π¦ π¦ π¦
First, pull the latest Helm chart for BinderHub.
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart
helm repo update
π¦ π¦ π¦ π¦ π¦
Next, install the required Helm chart using the config files we created in Step 2: Run setup.sh
.
helm install binderhub jupyterhub/binderhub \
--version=0.2.0-f565958 \
--namespace=binderhub \
-f secrets/secret.yaml \
-f secrets/config.yaml \
--create-namespace
--version
refers to the version of the BinderHub Helm Chart. Available versions can be found here. We have used the version released on 11 August 2019.- It is recommended that
--namespace
be the same as the providedNAME
in order to avoid confusion. It should be something short and descriptive. In this case, ourNAME
isbinderhub
, defined immediately afterhelm install
. --create-namespace
: In Helm v3, the namespace is no longer automatically created it if doesn't already exist. We use this flag to replicate that behaviour.
This step will deploy both a JupyterHub and a BinderHub but they are not yet configured to communicate with one another. You may need to wait a few moments before moving on as the resources may take a while to be set up.
π¦ π¦ π¦ π¦ π¦
Print the IP address of the JupyterHub that was just deployed by running the following command.
It will be listed in the EXTERNAL-IP
field.
kubectl get svc proxy-public --namespace=binderhub
π¦ π¦ π¦ π¦ π¦
We now need to edit some files in order to parse the IP address we just retrieved to the BinderHub deployment and allow the BinderHub and JupyterHub to share information.
The Azure Cloud Shell comes pre-installed with some terminal-based editors such as vi
and nano
.
I will demonstrate using nano
but feel free to use whichever editor you feel most comfortable with.
If you are following along with me and have not used nano
before, here are the basics.
To open the file, type nano FILENAME
.
And to close the file, type Ctrl + X
(^X
on a Mac).
It will ask you if you wish to save your edits before closing.
Now do the following steps:
- On line 5 of
setup.sh
, copy the IP address from the last command into thejupyter_ip
variable and uncomment the line (remove the#
from the beginning). - Again in
setup.sh
, move the line reading# -e "s/<jupyter-ip>/${jupyter_ip}/" \
(Line 27) above the lineconfig-template.yaml > secrets/config.yaml
and uncomment it by removing the#
from the start. - Uncomment line 8 of
config-template.yaml
by removing the#
from the beginning.
There are examples of how setup.sh
and config.yaml
should look after these edits at the end of this document.
π¦ π¦ π¦ π¦ π¦
Rerun setup.sh
.
./setup.sh
π¦ π¦ π¦ π¦ π¦
Now upgrade the Helm chart to deploy the change.
helm upgrade binderhub jupyterhub/binderhub \
--version=0.2.0-f565958 \
-f secrets/secret.yaml \
-f secrets/config.yaml \
--cleanup-on-fail
- If there was an error during the upgrade process,
--cleanup-on-fail
will remove any created resources. This will make the next deployment cleaner.
π¦ π¦ π¦ π¦ π¦
Find the IP address of your BinderHub under the EXTERNAL-IP
field.
kubectl get svc binder --namespace=binderhub
NOTE: THIS IS A DIFFERENT COMMAND TO THE LAST ONE!
Copy the IP address into your browser and your BinderHub should be waiting.
π¦ π¦ π¦ π¦ π¦
If you've been successful, a page identical to mybinder.org should appear. Type the following URL into the GitHub repo box and launch it: https://github.com/binder-examples/requirements. You can even sign in to your Docker account to see when the image has been pushed to the registry.
While trying to launch a repository, you may come across an Internal Server Error
.
In the case of this workshop, this is most likely due to the Kubernetes cluster only having a single node.
For a stable cluster, it is recommended to deploy on at least three nodes, however this is not permitted on a Free Trial subscription.
Another reason the Internal Server Error
may appear is that you've tried to launch a repository immediately after a helm upgrade
.
It can take some time after an upgrade for the JupyterHub to reset it's internal state.
Always check that all pods have RUNNING
status using kubectl get pods -n binderhub
before trying to launch a repo.
β β β β β
If something is not working correctly with your BinderHub, the quickest way to find the problem is to access the JupyterHub logs. Executing the following commands will print the JupyterHub logs to your terminal.
# Lists all active pods. Find the one beginning with "hub-"
kubectl get pods --namespace binderhub
# Where <random-str> matches the output from the last step
kubectl logs hub-<random-str> --namespace binderhub
You can also access information about individual pods with the following command.
kubectl describe pod <POD-NAME> --namespace binderhub
One fun way to make your BinderHub your own is to change the logo that appears on the Binder launch page.
This is achieved by customising or extending the html
template the website is built from.
See the BinderHub customisation documentation for more details.
You will need an image file. Unsplash is a good source of free images for the purposes of this workshop. For your production BinderHub, you would probably create your own logo!
I have created a template repo to make this easier - linked below. Fork it to your own GitHub account.
github.com/alan-turing-institute/binderhub-custom-files
NOTE: The repo hosting your html
pages must be public.
Upload your chosen image file to the static
folder on your fork.
Any image file type will suffice.
In the templates
folder, edit page.html
to contain the name of your image file by replacing <custom-logo-file>
.
(You do not need to include the static/
prefix.)
Add the following to your config.yaml
file.
Remember to replace <your-github-username>
in the repo URL with your GitHub username!
config:
BinderHub:
template_path: /etc/binderhub/custom/templates
extra_static_path: /etc/binderhub/custom/static
extra_static_url_prefix: /extra_static/
template_variables:
EXTRA_STATIC_URL_PREFIX: "/extra_static/"
initContainers:
- name: git-clone-templates
image: alpine/git
args:
- clone
- --single-branch
- --branch=main
- --depth=1
- --
- https://github.com/<your-github-username>/binderhub-custom-files
- /etc/binderhub/custom
securityContext:
runAsUser: 0
volumeMounts:
- name: custom-templates
mountPath: /etc/binderhub/custom
extraVolumes:
- name: custom-templates
emptyDir: {}
extraVolumeMounts:
- name: custom-templates
mountPath: /etc/binderhub/custom
NOTE: If you committed the image file and the change to page.html
to a branch other than main
, then you either need to merge your changes into main
or change the --branch
argument in the above snippet to match the name of your branch.
To deploy the changes, upgrade the helm chart.
helm upgrade binderhub jupyterhub/binderhub \
--version=0.2.0-f565958 \
-f secrets/secret.yaml \
-f secrets/config.yaml \
--cleanup-on-fail
Visit your Binder page to see your new logo! To get the IP address of the Binder page, run the following command.
kubectl get svc binder --namespace=binderhub
Adapted from Enabling Authentication and Authentication.
The default is for BinderHub to run without authentication and, for each launch, it creates a temporary user and starts a server for that user.
You can enable authentication for BinderHub by using JupyterHub as an OAuth provider by editing config.yaml
.
You should update config-template.yaml
and setup.sh
accordingly to handle the new information.
First add auth_enabled: true
under the config.BinderHub
key.
Then add the following as an unindented level key.
NOTE: In the following, binderhub_url
is the IP address you visit to reach your Binder launch page (i.e. the output of Step 5: Try out your BinderHub deployment!) and jupyterhub_url
is the IP address listed under config.BinderHub.hub_url
and the top of config.yaml
.
jupyterhub:
cull:
# don't cull authenticated users
users: False
custom:
binderauth_enabled: true
hub:
redirectToServer: false
services:
binder:
oauth_redirect_uri: "http://<binderhub_url>/oauth_callback"
oauth_client_id: "binder-oauth-client-test"
singleuser:
# to make notebook servers aware of hub
cmd: jupyterhub-singleuser
auth:
type: github
github:
clientId: "<Your GitHub Client ID>"
clientSecret: "<Your GitHub Client Secret>"
callbackUrl: "http://<jupyterhub_url>/hub/oauth_callback"
NOTE: We will generate clientId
and clientSecret
in the next step.
Go to GitHub, click your profile picture (in the top right corner) and select "Settings" from the drop down menu. At the bottom of the list on the left, select "Developer settings", then click "New OAuth App".
Fill in the form using your binderhub_url
and jupyter_url
from Step 1: Editing config.yaml
(see image below) and click "Register Application".
The URL entered into the "Authorization callback URL" field must match your auth.github.callbackUrl
in your config.yaml
.
Once your App is registered, a Client ID and Client Secret will be generated.
Copy these into the clientId
and clientSecret
fields in config.yaml
, as strings, respectively.
To apply the config changes, we need to upgrade the deployed Helm chart using the same command as in Step 4: Connect JupyterHub and BinderHub.
helm upgrade binderhub jupyterhub/binderhub \
--version=0.2.0-f565958 \
-f secrets/secret.yaml \
-f secrets/config.yaml \
--cleanup-on-fail
Now reload your Binder page, you should see a sign in button and will be asked for your GitHub sign in information!
Adapted from Tearing Everything Down.
When you're no longer using your BinderHub, you should destroy it to avoid paying extra costs for it! This involves deleting the Helm release and all of the computing resources in Azure.
First we delete the Helm release that installed the JupyterHub and BinderHub and any resources that it created.
helm delete binderhub
NOTE: binderhub
is the release name we defined in Step 3: Install BinderHub.
Next we delete the Kubernetes namespace the hub was installed in. This will delete any disks that were created to store user's data and any IP addresses.
kubectl delete namespace binderhub
You can list your active resource groups using the following command.
az group list --output table
You can then delete the group for your BinderHub.
az group delete --name testhub --no-wait
NOTE:
- Be careful to select the correct resource group as this step will irreversibly delete all the resources in that group!
testhub
is thename
/namespace
we created in Step 3: Create a Resource Group.
You can use the Azure Portal to double check all of your resources have been deleted. It may take a few minutes to clear up, but nothing relating to your BinderHub should remain after this step.
You should also delete the NetworkWatcherRG
group if you did not do so earlier.
az group delete --name NetworkWatcherRG --no-wait
If you enabled GitHub authentication on your BinderHub, don't forget to delete the OAuth Application in "Developer Settings" on github.com as well.
jupyterhub:
hub:
services:
binder:
apiToken: "<output of FIRST 'openssl rand -hex 32' command>"
proxy:
secretToken: "<output of SECOND 'openssl rand -hex 32' command>"
registry:
username: <docker-id>
password: <password>
NOTE: <docker-id>
must be your DockerHub username, not your email address.
config:
BinderHub:
use_registry: true
image_prefix: <docker-id OR organisation-name>/<prefix>-
hub_url: http://<EXTERNAL-IP from Step 5>
NOTE:
- If your Docker account is part of an organisation where you would like to store images instead, change the value of
image_prefix
to<docker-organisation-name>/<prefix>-
- The
<prefix>
can be any string since it will be prepended to image names. It is recommended to be something short and descriptive, such asbinder-dev-
(for development) orbinder-prod-
(for the final product).
#!/bin/bash
# Variables
prefix=binder-dev # Docker image prefix
jupyter_ip=xx.xx.xx.xx # Fill in the IP Address of your JupyterHub here
# Get DockerHub login details here
echo Please provide your DockerHub login details.
read -p "DockerHub ID (NOT email): " docker_id
read -sp "DockerHub password: " docker_pass
echo
# Make secrets directory if it doesn't already exist
mkdir -p secrets
# Populate secret.yaml
sed -e "s/<apiToken>/$(cat secrets/apiToken.txt)/" \
-e "s/<secretToken>/$(cat secrets/secretToken.txt)/" \
-e "s/<docker-id>/${docker_id}/" \
-e "s/<password>/${docker_pass}/" \
secret-template.yaml > secrets/secret.yaml
# Populate config.yaml
sed -e "s/<docker>/${docker_id}/" \
-e "s/<prefix>/${prefix}/" \
-e "s/<jupyter-ip>/${jupyter_ip}/" \
config-template.yaml > secrets/config.yaml
# End script with some outputs
echo Your BinderHub files have been configured!
ls secrets/
echo
- Cluster: a group of computing machines (real or virtual) to deploy apps or containers into
- Deployment: instructions to Kubernetes on how to update instances of a deployed application
- Nodes: Workers that run the applications
- Pod: a Kubernetes abstraction representing a group of one or more application containers and some shared resources
- Service: an abstraction defining a set of Pods and how they can be accessed; a Service is defined using YAML (or JSON) and allows applications to receive traffic/be exposed outside of the Cluster