This project is experimental. Expect the API to change. It is not recommended for production environments.
Kubernetes offers the facility of extending its API through the concept of Operators. This repository contains the resources and code to deploy an Azure Databricks Operator for Kubernetes.
The Databricks operator is useful in situations where Kubernetes hosted applications wish to launch and use Databricks data engineering and machine learning tasks.
-
Easy to use: Azure Databricks operations can be done by using Kubectl there is no need to learn or install data bricks utils command line and it’s python dependency
-
Security: No need to distribute and use Databricks token, the data bricks token is used by operator
-
Version control: All the YAML or helm charts which has azure data bricks operations (clusters, jobs, …) can be tracked
-
Automation: Replicate azure data bricks operations on any data bricks workspace by applying same manifests or helm charts
The project was built using
- Download the latest release manifests:
wget https://github.com/microsoft/azure-databricks-operator/releases/latest/download/release.zip
unzip release.zip
- Create the
azure-databricks-operator-system
namespace:
kubectl create namespace azure-databricks-operator-system
- Create Kubernetes secrets with values for
DATABRICKS_HOST
andDATABRICKS_TOKEN
:
kubectl --namespace azure-databricks-operator-system \
create secret generic dbrickssettings \
--from-literal=DatabricksHost="https://xxxx.azuredatabricks.net" \
--from-literal=DatabricksToken="xxxxx"
- Apply the manifests for the Operator and CRDs in
release/config
:
kubectl apply -f release/config
For details deployment guides please see deploy.md
- Create a spark cluster on demand and run a databricks notebook.
- Create an interactive spark cluster and Run a databricks job on exisiting cluster.
- Create azure databricks secret scope by using kuberentese secrets
For samples and simple use cases on how to use the operator please see samples.md
On click start by using vscode
For more details please see contributing.md
Check roadmap.md for what has been supported and what's coming.
Few topics are discussed in the resources.md
- Dev container
- Build pipelines
- Operator metrics
- Kubernetes on WSL
For instructions about setting up your environment to develop and extend the operator, please see contributing.md
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.