KubeCon EU 2018

This repository contains the demo code for my KubeCon EU 2018 talk about automating GPU infrastructure for Kubernetes on Container Linux.

Prerequisites

You will need a Google Cloud account with available quota for NVIDIA GPUs.

Edit the require.tf Terraform file and uncomment and add the details for your Google Cloud project:

$EDITOR require.tf

Modify the provided terraform.tfvars file to suit your project:

$EDITOR terraform.tfvars

get nodes:

export KUBECONFIG="$(pwd)"/assets/auth/kubeconfig
watch -n 1 kubectl get nodes

check status of driver installer:

kubectl logs $(kubectl get pods -n kube-system | grep nvidia-driver-installer | awk '{print $1}') -c modulus -n kube-system -f

check status of device plugin:

kubectl logs $(kubectl get pods -n kube-system | grep nvidia-gpu-device-plugin | awk '{print $1}' | head -n1 | tail -n1) -n kube-system -f

verify worker node has allocatable GPUs:

kubectl describe node $(kubectl get nodes | grep worker | awk '{print $1}')

let's see if the GPU workload has been scheduled:

watch -n 2 kubectl get pods
kubectl logs $(kubectl get pods | grep darkapi | awk '{print $1}') -f

for fun, let's test the GPU workload:

export INGRESS=$(terraform output | grep ingress_static_ip | awk '{print $3}')
~/code/darkapi/client http://$INGRESS/api/yolo

Component	URL
Kubernetes installer	https://github.com/poseidon/typhoon
GPU driver installer	https://github.com/squat/modulus
Kubernetes device plugin	https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml
sample workload	https://github.com/squat/darkapi