This repository contains the demo code for my KubeCon EU 2018 talk about automating GPU infrastructure for Kubernetes on Container Linux.
You will need a Google Cloud account with available quota for NVIDIA GPUs.
Edit the require.tf
Terraform file and uncomment and add the details for your Google Cloud project:
$EDITOR require.tf
Modify the provided terraform.tfvars
file to suit your project:
$EDITOR terraform.tfvars
-
create cluster:
terraform apply --auto-approve
-
get nodes:
export KUBECONFIG="$(pwd)"/assets/auth/kubeconfig watch -n 1 kubectl get nodes
-
create GPU manifests:
kubectl apply -f manifests
-
check status of driver installer:
kubectl logs $(kubectl get pods -n kube-system | grep nvidia-driver-installer | awk '{print $1}') -c modulus -n kube-system -f
-
check status of device plugin:
kubectl logs $(kubectl get pods -n kube-system | grep nvidia-gpu-device-plugin | awk '{print $1}' | head -n1 | tail -n1) -n kube-system -f
-
verify worker node has allocatable GPUs:
kubectl describe node $(kubectl get nodes | grep worker | awk '{print $1}')
-
let's inspect the GPU workload:
less manifests/darkapi.yaml
-
let's see if the GPU workload has been scheduled:
watch -n 2 kubectl get pods kubectl logs $(kubectl get pods | grep darkapi | awk '{print $1}') -f
-
for fun, let's test the GPU workload:
export INGRESS=$(terraform output | grep ingress_static_ip | awk '{print $3}') ~/code/darkapi/client http://$INGRESS/api/yolo
-
finally, let's clean up:
terraform destroy --auto-approve
Component | URL |
---|---|
Kubernetes installer | https://github.com/poseidon/typhoon |
GPU driver installer | https://github.com/squat/modulus |
Kubernetes device plugin | https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml |
sample workload | https://github.com/squat/darkapi |