Short video at http://bit.ly/ocp4kubeflow
The above solution uses Dex as OIDC provider. It should be relatively straightforward to use RH-SSO/Keycloak. Please find labs and videos with RH-SSO on OpenShift 4.2 at http://bit.ly/33rjfry
For detailed instructions and videos on setting up CodeReady Containers / OpenShift 4.2 on bare metal servers, please see:
crc version && oc version && kfctl version
version: 1.0.0+575079b OpenShift version:4.2.0 (embedded in binary) Client Version: v4.3.0 Server Version: 4.2.0 Kubernetes Version: v1.14.6+2e5ed54 kfctl v0.7.0-rc.5-7-gc66ebff3
Before installing Kubeflow, we need to take some further actions. The disk space of the VM is too low, so we will expand it first:
sudo dnf -y install libguestfs libguestfs-tools
export CRC_MACHINE_IMAGE=${HOME}/.crc/machines/crc/crc
#example : export CRC_MACHINE_IMAGE=/home/demouser/.crc/machines/crc/crc
crc stop
#adding 500G
sudo qemu-img resize ${CRC_MACHINE_IMAGE} +500G
sudo cp ${CRC_MACHINE_IMAGE} ${CRC_MACHINE_IMAGE}.ORIGINAL
sudo virt-resize --expand /dev/sda3 ${CRC_MACHINE_IMAGE}.ORIGINAL ${CRC_MACHINE_IMAGE}
sudo rm ${CRC_MACHINE_IMAGE}.ORIGINAL
crc setup
crc start
crc status
Now login as kubeadmin:
oc login -u kubeadmin -p wyozw-5ywAy-5yoap-7rj8q https://api.crc.testing:6443
After the VM starts, we need to take some actions to ensure everything runs correctly:
-
The PVs provided by CRC are of type hostPath. This means that securityContext settings like fsGroup won’t work correctly, which is a problem for Jupyter Notebooks and the OIDC AuthService. To fix this, we ssh into the VM and make the PV folder read/writeable by all users:
ssh -i ~/.crc/machines/crc/id_rsa [email protected] sudo chmod "0777" -R /mnt/pv-data/ exit
-
Some of the Kubeflow Pods require higher permissions than what Openshift gives them. To remedy that, we are going to add certain ServiceAccounts to higher permissions [SCCs](https://blog.openshift.com/understanding-service-accounts-sccs/).
# istio-system oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:istio-ingress-service-account oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:default oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:grafana oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:prometheus oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:istio-egressgateway-service-account oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:istio-citadel-service-account oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:istio-ingressgateway-service-account oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:istio-cleanup-old-ca-service-account oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:istio-mixer-post-install-account oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:istio-mixer-service-account oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:istio-pilot-service-account oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:istio-sidecar-injector-service-account oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:istio-galley-service-account oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:istio-multi oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:builder oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:deployer oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:istio-cleanup-secrets-service-account oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:istio-grafana-post-install-account oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:istio-security-post-install-account oc adm policy add-scc-to-user anyuid system:serviceaccount:istio-system:kiali-service-account oc adm policy add-scc-to-user anyuid -z istio-egressgateway-service-account -n istio-system oc adm policy add-scc-to-user anyuid -z istio-citadel-service-account -n istio-system oc adm policy add-scc-to-user anyuid -z istio-ingressgateway-service-account -n istio-system oc adm policy add-scc-to-user anyuid -z istio-cleanup-old-ca-service-account -n istio-system oc adm policy add-scc-to-user anyuid -z istio-mixer-post-install-account -n istio-system oc adm policy add-scc-to-user anyuid -z istio-mixer-service-account -n istio-system oc adm policy add-scc-to-user anyuid -z istio-pilot-service-account -n istio-system oc adm policy add-scc-to-user anyuid -z istio-sidecar-injector-service-account -n istio-system # kubeflow oc adm policy add-scc-to-user anyuid system:serviceaccount:kubeflow:default oc adm policy add-scc-to-user anyuid system:serviceaccount:kubeflow:admission-webhook-service-account oc adm policy add-scc-to-user anyuid system:serviceaccount:kubeflow:katib-controller oc adm policy add-scc-to-user anyuid system:serviceaccount:kubeflow:katib-ui oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:ml-pipeline oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:pipeline-runner oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:admission-webhook-service-account oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:application-controller-service-account oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:builder oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:centraldashboard oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:default oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:deployer oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:jupyter-web-app-service-account oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:argo oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:argo-ui oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:metadata-ui oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:ml-pipeline oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:ml-pipeline-persistenceagent oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:ml-pipeline-ui oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:ml-pipeline-viewer-crd-service-account oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:notebook-controller-service-account oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:profiles-controller-service-account oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:spartakus oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:tf-job-dashboard oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:tf-job-operator oc adm policy add-role-to-user cluster-admin system:serviceaccount:kubeflow:pytorch-operator oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:ml-pipeline oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:pipeline-runner oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:admission-webhook-service-account oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:application-controller-service-account oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:builder oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:centraldashboard oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:default oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:deployer oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:jupyter-web-app-service-account oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:argo oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:argo-ui oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:metadata-ui oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:ml-pipeline oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:ml-pipeline-persistenceagent oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:ml-pipeline-ui oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:ml-pipeline-viewer-crd-service-account oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:notebook-controller-service-account oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:profiles-controller-service-account oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:spartakus oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:tf-job-dashboard oc adm policy add-scc-to-user privileged system:serviceaccount:kubeflow:tf-job-operator
The instructions are available in the existing_arrikto config docs (https://www.kubeflow.org/docs/started/k8s/kfctl-existing-arrikto/). We copy them here for the sake of reproducibility.
# Download the kfctl binary wget 'https://github.com/kubeflow/kubeflow/releases/download/v0.7.0-rc.6/kfctl_v0.7.0-rc.5-7-gc66ebff3_linux.tar.gz' tar -xvf kfctl_v0.7.0-rc.5-7-gc66ebff3_linux.tar.gz # Add kfctl to PATH, to make the kfctl binary easier to use. # Use only alphanumeric characters or - in the directory name. export PATH=$PATH:"<path-to-kfctl>" # Set the following kfctl configuration file: export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_existing_arrikto.0.7.0.yaml" # Set KF_NAME to the name of your Kubeflow deployment. You also use this # value as directory name when creating your configuration directory. # For example, your deployment name can be 'my-kubeflow' or 'kf-test'. export KF_NAME=<your choice of name for the Kubeflow deployment> # Set the path to the base directory where you want to store one or more # Kubeflow deployments. For example, /opt/. # Then set the Kubeflow application directory for this deployment. export BASE_DIR=<path to a base directory> export KF_DIR=${BASE_DIR}/${KF_NAME} mkdir -p ${KF_DIR} cd ${KF_DIR} # Download the config file and change the default login credentials. wget -O kfctl_existing_arrikto.yaml $CONFIG_URI export CONFIG_FILE=${KF_DIR}/kfctl_existing_arrikto.yaml # Credentials for the default user are [email protected]:12341234 # To change them, please edit the dex-auth application parameters # inside the KfDef file. vim $CONFIG_FILE kfctl apply -V -f ${CONFIG_FILE}
-
Add permissions for notebooks/finalizers on
notebook-controller-role
ClusterRole.
oc edit clusterrole notebook-controller-role -n kubeflow
-
Add permissions for workflow delete and workflows/finalizers on
argo
ClusterRole.
oc edit clusterrole argo -n kubeflow
-
Add permissions for experiments, trials and suggestions finalizers on
katib-controller
ClusterRole.
oc edit clusterrole katib-controller -n kubeflow
-
Add pods, pods/status, pods/finalizers resources on
tf-job-operator
ClusterRole.
oc edit clusterrole tf-job-operator
-
After installing, you may notice that some istio Pods are in CrashLoopBackoff. This happens when Istio Pods don’t have enough memory and end up getting OOMKilled. To fix it, please allocate more RAM to those Pods by editing their deployments. A proposed value is 256Mi for requests and 512Mi for limits.
oc edit deployment istio-ingressgateway -n istio-system oc edit deployment istio-egressgateway -n istio-system oc edit deployment istio-pilot -n istio-system oc edit deployment istio-policy -n istio-system ...
-
When creating a notebook, you may notice that it can’t assign the fsGroup it desires. To give it the necessary permissions, add it to the nonroot scc:
NS=<ns> oc adm policy add-scc-to-user anyuid system:serviceaccount:${NS}:default-editor oc adm policy add-scc-to-user privileged -z default-editor -n ${NS}
After setting up everything, you can connect to Kubeflow by exposing the istio-ingressgateway Service.
oc expose service istio-ingressgateway --port 80 -n istio-system
Then you can access Kubeflow at: http://istio-ingressgateway-istio-system.apps-crc.testing
You can also expose the ingressgateway via port-forward:
oc port-forward -n istio-system svc/istio-ingressgateway 8080:80
If you run CRC in a VM, you can use a SOCKS5 proxy to access the Kubeflow website:
ssh -D 127.0.0.1:12345 <user>@<public-ip> google-chrome --incognito --user-data-dir=/tmp/delme --proxy-server=socks5://127.0.0.1:12345 --dns-prefetch-disable
oc edit cm workflow-controller-configmap -n kubeflow containerRuntimeExecutor: pns
See info on PNS (Process Namespace Sharing)
at https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace
On RHEL 8.2: yum install @python36 sudo pip3 install https://storage.googleapis.com/ml-pipeline/release/latest/kfp.tar.gz --upgrade wget https://raw.githubusercontent.com/kubeflow/pipelines/master/samples/contrib/volume_ops/volumeop_sequential.py dsl-compile --py volumeop_sequential.py --output volumeop.tar.gz
Upload the compiled pipeline (volumeop.tar.gz) to Kubeflow
Run the volumeop pipeline and validate that everything works
oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE katib-mysql Bound pv0014 10Gi RWO,ROX,RWX 5h43m metadata-mysql Bound pv0015 10Gi RWO,ROX,RWX 5h43m minio-pv-claim Bound pv0003 20Gi RWO,ROX,RWX 5h43m mypvc Bound pv0024 10Gi RWO,ROX,RWX 110m mysql-pv-claim Bound pv0002 20Gi RWO,ROX,RWX 5h43m newpvc Bound pv0017 10Gi RWO,ROX,RWX 4h29m volumeop-sequential-2zz4q-newpvc Bound pv0022 10Gi RWO,ROX,RWX 5h17m volumeop-sequential-52whw-newpvc Bound pv0028 10Gi RWO,ROX,RWX 45m volumeop-sequential-k9tvz-newpvc Bound pv0016 10Gi RWO,ROX,RWX 13m volumeop-sequential-qls6g-newpvc Bound pv0018 10Gi RWO,ROX,RWX 33m
$ ssh -i ~/.crc/machines/crc/id_rsa [email protected] Red Hat Enterprise Linux CoreOS 42.80.20191010.0 [core@crc-847lc-master-0 ~]$ cd /mnt/pv-data/ [core@crc-847lc-master-0 pv-data]$ ls pv0001 pv0003 pv0005 pv0007 pv0009 pv0011 pv0013 pv0015 pv0017 pv0019 pv0021 pv0023 pv0025 pv0027 pv0029 pv0002 pv0004 pv0006 pv0008 pv0010 pv0012 pv0014 pv0016 pv0018 pv0020 pv0022 pv0024 pv0026 pv0028 pv0030 [core@crc-847lc-master-0 pv-data]$ cd pv0016 [core@crc-847lc-master-0 pv0016]$ ls file1 file2 [core@crc-847lc-master-0 pv0016]$ cat file1 1
See short video at https://youtu.be/zMQFOjrfhrU
git clone https://github.com/kubeflow/katib.git cd katib/examples/v1alpha3
oc logs pytorchjob-example-z6qzbgv6-master-0 -c metrics-collector .... I1117 00:31:19.116058 23 main.go:78] Train Epoch: 1 [28160/60000 (47%)] loss=0.0890 I1117 00:31:25.963848 23 main.go:78] Train Epoch: 1 [28800/60000 (48%)] loss=0.0514 I1117 00:31:35.932306 23 main.go:78] Train Epoch: 1 [29440/60000 (49%)] loss=0.1868 ....