Skip to content

Latest commit

 

History

History
98 lines (66 loc) · 2.9 KB

README.md

File metadata and controls

98 lines (66 loc) · 2.9 KB

KubeCon EU 2018

This repository contains the demo code for my KubeCon EU 2018 talk about automating GPU infrastructure for Kubernetes on Container Linux.

youtube asciicast

Prerequisites

You will need a Google Cloud account with available quota for NVIDIA GPUs.

Getting Started

Edit the require.tf Terraform file and uncomment and add the details for your Google Cloud project:

$EDITOR require.tf

Modify the provided terraform.tfvars file to suit your project:

$EDITOR terraform.tfvars

Running

  1. create cluster:

    terraform apply --auto-approve
  2. get nodes:

    export KUBECONFIG="$(pwd)"/assets/auth/kubeconfig
    watch -n 1 kubectl get nodes
  3. create GPU manifests:

    kubectl apply -f manifests
  4. check status of driver installer:

    kubectl logs $(kubectl get pods -n kube-system | grep nvidia-driver-installer | awk '{print $1}') -c modulus -n kube-system -f
  5. check status of device plugin:

    kubectl logs $(kubectl get pods -n kube-system | grep nvidia-gpu-device-plugin | awk '{print $1}' | head -n1 | tail -n1) -n kube-system -f
  6. verify worker node has allocatable GPUs:

    kubectl describe node $(kubectl get nodes | grep worker | awk '{print $1}')
  7. let's inspect the GPU workload:

    less manifests/darkapi.yaml
  8. let's see if the GPU workload has been scheduled:

    watch -n 2 kubectl get pods
    kubectl logs $(kubectl get pods | grep darkapi | awk '{print $1}') -f
  9. for fun, let's test the GPU workload:

    export INGRESS=$(terraform output | grep ingress_static_ip | awk '{print $3}')
    ~/code/darkapi/client http://$INGRESS/api/yolo
  10. finally, let's clean up:

    terraform destroy --auto-approve

Projects Leveraged In This Demo

Component URL
Kubernetes installer https://github.com/poseidon/typhoon
GPU driver installer https://github.com/squat/modulus
Kubernetes device plugin https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml
sample workload https://github.com/squat/darkapi