Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databricks plugin #3142

Merged
merged 12 commits into from
Dec 19, 2022
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions charts/flyte-core/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ helm install gateway bitnami/contour -n flyte
| configmap.task_logs.plugins.logs.cloudwatch-enabled | bool | `false` | One option is to enable cloudwatch logging for EKS, update the region and log group accordingly |
| configmap.task_resource_defaults | object | `{"task_resources":{"defaults":{"cpu":"100m","memory":"500Mi","storage":"500Mi"},"limits":{"cpu":2,"gpu":1,"memory":"1Gi","storage":"20Mi"}}}` | Task default resources configuration Refer to the full [structure](https://pkg.go.dev/github.com/lyft/[email protected]/pkg/runtime/interfaces#TaskResourceConfiguration). |
| configmap.task_resource_defaults.task_resources | object | `{"defaults":{"cpu":"100m","memory":"500Mi","storage":"500Mi"},"limits":{"cpu":2,"gpu":1,"memory":"1Gi","storage":"20Mi"}}` | Task default resources parameters |
| databricks | object | `{"enabled":false,"plugin_config":{"plugins":{"databricks":{"databricksInstance":"dbc-a53b7a3c-614c","entrypointFile":"dbfs:///FileStore/tables/entrypoint.py"}}}}` | Optional: Databricks Plugin allows us to run the spark job on the Databricks platform. |
| datacatalog.affinity | object | `{}` | affinity for Datacatalog deployment |
| datacatalog.configPath | string | `"/etc/datacatalog/config/*.yaml"` | Default regex string for searching configuration files |
| datacatalog.enabled | bool | `true` | |
Expand Down
5 changes: 5 additions & 0 deletions charts/flyte-core/templates/propeller/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ data:
{{- with .Values.sparkoperator.plugin_config }}
spark.yaml: | {{ tpl (toYaml .) $ | nindent 4 }}
{{- end }}
{{- end }}
{{- if .Values.databricks.enabled }}
{{- with .Values.databricks.plugin_config }}
databricks.yaml: | {{ tpl (toYaml .) $ | nindent 4 }}
{{- end }}
{{- end }}
storage.yaml: | {{ tpl (include "storage" .) $ | nindent 4 }}
cache.yaml: |
Expand Down
15 changes: 15 additions & 0 deletions charts/flyte-core/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -821,3 +821,18 @@ sparkoperator:
- spark.blacklist.enabled: "true"
- spark.blacklist.timeout: "5m"
- spark.task.maxfailures: "8"


# --------------------------------------------------------
# Optional Plugins
# --------------------------------------------------------

# -- Optional: Databricks Plugin allows us to run the spark job on the Databricks platform.
databricks:
enabled: false
plugin_config:
plugins:
databricks:
entrypointFile: dbfs:///FileStore/tables/entrypoint.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an example of this file?

# Databricks account
databricksInstance: dbc-a53b7a3c-614c
158 changes: 158 additions & 0 deletions rsts/deployment/plugin_setup/webapi/databricks.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
.. _deployment-plugin-setup-webapi-databricks:

Databricks Plugin Setup
-----------------------

This guide gives an overview of how to set up Databricks in your Flyte deployment.

1. Add Flyte chart repo to Helm

.. code-block::

helm repo add flyteorg https://flyteorg.github.io/flyte


2. Setup the cluster

.. tabbed:: Sandbox

* Start the sandbox cluster

.. code-block:: bash

flytectl sandbox start

* Generate Flytectl sandbox config

.. code-block:: bash

flytectl config init

.. tabbed:: AWS/GCP

* Make sure you have up and running flyte cluster in `AWS <https://docs.flyte.org/en/latest/deployment/aws/index.html#deployment-aws>`__ / `GCP <https://docs.flyte.org/en/latest/deployment/gcp/index.html#deployment-gcp>`__
* Make sure you have correct kubeconfig and selected the correct kubernetes context
* make sure you have the correct flytectl config at ~/.flyte/config.yaml

3. Upload an ``entrypoint.py`` to dbfs or s3. Spark driver node run this file to override the default command in the dbx job.

.. code-block:: python

# entrypoint.py
import os
import sys
from typing import List

import click
from flytekit.bin.entrypoint import fast_execute_task_cmd as _fast_execute_task_cmd
from flytekit.bin.entrypoint import execute_task_cmd as _execute_task_cmd
from flytekit.exceptions.user import FlyteUserException
from flytekit.tools.fast_registration import download_distribution


def fast_execute_task_cmd(additional_distribution: str, dest_dir: str, task_execute_cmd: List[str]):
if additional_distribution is not None:
if not dest_dir:
dest_dir = os.getcwd()
download_distribution(additional_distribution, dest_dir)

# Insert the call to fast before the unbounded resolver args
cmd = []
for arg in task_execute_cmd:
if arg == "--resolver":
cmd.extend(["--dynamic-addl-distro", additional_distribution, "--dynamic-dest-dir", dest_dir])
cmd.append(arg)

click_ctx = click.Context(click.Command("dummy"))
parser = _execute_task_cmd.make_parser(click_ctx)
args, _, _ = parser.parse_args(cmd[1:])
_execute_task_cmd.callback(**args)


def main():

args = sys.argv

click_ctx = click.Context(click.Command("dummy"))
if args[1] == "pyflyte-fast-execute":
parser = _fast_execute_task_cmd.make_parser(click_ctx)
args, _, _ = parser.parse_args(args[2:])
fast_execute_task_cmd(**args)
elif args[1] == "pyflyte-execute":
parser = _execute_task_cmd.make_parser(click_ctx)
args, _, _ = parser.parse_args(args[2:])
_execute_task_cmd.callback(**args)
else:
raise FlyteUserException(f"Unrecognized command: {args[1:]}")


if __name__ == '__main__':
main()



4. Create a file named ``values-override.yaml`` and add the following config to it:

.. code-block:: yaml

configmap:
enabled_plugins:
# -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig)
tasks:
# -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig)
task-plugins:
# -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable sagemaker*, athena if you install the backend
# plugins
enabled-plugins:
- container
- sidecar
- k8s-array
- databricks
default-for-task-types:
container: container
sidecar: sidecar
container_array: k8s-array
spark: databricks
databricks:
enabled: True
plugin_config:
plugins:
databricks:
entrypointFile: dbfs:///FileStore/tables/entrypoint-4.py
databricksInstance: dbc-a53b7a3c-614c

5. Create a Databricks account and follow the docs for creating an Access token.

6. Create a `Instance Profile <https://docs.databricks.com/administration-guide/cloud-configurations/aws/instance-profiles.html>`_ for the Spark cluster, it allows the spark job to access your data in the s3 bucket.

7. Add Databricks access token to FlytePropeller.

.. note::
Refer to the `access token <https://docs.databricks.com/dev-tools/auth.html#databricks-personal-access-tokens>`__ to understand setting up the Databricks access token.

.. code-block:: bash

kubectl edit secret -n flyte flyte-secret-auth

The configuration will look as follows:

.. code-block:: yaml

apiVersion: v1
data:
FLYTE_DATABRICKS_API_TOKEN: <ACCESS_TOKEN>
client_secret: Zm9vYmFy
kind: Secret
metadata:
annotations:
meta.helm.sh/release-name: flyte
meta.helm.sh/release-namespace: flyte
...

Replace ``<ACCESS_TOKEN>`` with your access token.

8. Upgrade the Flyte Helm release.

.. code-block:: bash

helm upgrade -n flyte -f https://raw.githubusercontent.com/flyteorg/flyte/master/charts/flyte-core/values-sandbox.yaml -f values-override.yaml flyteorg/flyte-core
8 changes: 8 additions & 0 deletions rsts/deployment/plugin_setup/webapi/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,20 @@ Web API Plugin Setup
^^^^^^^^^^^^
Guide to setting up the Snowflake Plugin.

.. link-button:: deployment-plugin-setup-webapi-databricks
:type: ref
:text: Databricks Plugin
:classes: btn-block stretched-link
^^^^^^^^^^^^
Guide to setting up the Databricks Plugin.


.. toctree::
:maxdepth: 1
:name: Web API plugin Setup
:hidden:

snowflake
databricks