Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Databricks Agent Doc #4008

Merged
merged 30 commits into from
Dec 2, 2023
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
9bf1fed
Add Agent Service Doc in Databircricks Plugin
Sep 4, 2023
9beab93
add secret
Sep 4, 2023
7b9de10
Update rsts/deployment/plugins/webapi/databricks.rst
Future-Outlier Sep 9, 2023
e4c2c13
Update rsts/deployment/plugins/webapi/databricks.rst
Future-Outlier Sep 9, 2023
da88a03
Update rsts/deployment/plugins/webapi/databricks.rst
Future-Outlier Sep 9, 2023
deb7bd0
Update rsts/deployment/plugins/webapi/databricks.rst
Future-Outlier Sep 9, 2023
ad6e5c7
Merge branch 'flyteorg:master' into revise-agent-doc
Future-Outlier Sep 9, 2023
61ff2a3
update agent service docs
Sep 9, 2023
a6d51af
Merge branch 'master' of https://github.com/Future-Outlier/flyte into…
Sep 26, 2023
cda9d69
move to agent doc
Sep 26, 2023
ff6ce66
Merge branch 'master' of https://github.com/Future-Outlier/flyte into…
Oct 3, 2023
fdeff32
Merge branch 'master' of https://github.com/Future-Outlier/flyte into…
Oct 4, 2023
7bef497
databricks agent doc
Oct 4, 2023
a1353e7
Merge branch 'master' into revise-agent-doc
Future-Outlier Oct 4, 2023
83ab801
Update rsts/deployment/agents/index.rst
Future-Outlier Oct 5, 2023
7ee994d
Merge branch 'flyteorg:master' into revise-agent-doc
Future-Outlier Oct 10, 2023
26b786e
add supported task type
Oct 10, 2023
925b0a9
make bash command the same format
Oct 12, 2023
7736f5a
fix bash too big
Oct 13, 2023
6fba648
kevin update
pingsutw Nov 8, 2023
f8397a0
nit
pingsutw Nov 8, 2023
fa138b3
update docs fix error
Nov 8, 2023
8a903ac
Merge branch 'revise-agent-doc' of https://github.com/Future-Outlier/…
Nov 8, 2023
0ac6e0d
final version
Nov 8, 2023
22771d6
Merge branch 'master' of https://github.com/Future-Outlier/flyte into…
Nov 11, 2023
7fc5cea
Merge branch 'master' into revise-agent-doc
Future-Outlier Nov 11, 2023
4fcc2d3
Merge branch 'master' into revise-agent-doc
pingsutw Nov 11, 2023
a0aad2f
Merge branch 'master' of https://github.com/Future-Outlier/flyte into…
Nov 28, 2023
9981a11
improvement
Nov 28, 2023
64b0a54
Merge branch 'master' into revise-agent-doc
pingsutw Dec 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
342 changes: 342 additions & 0 deletions rsts/deployment/agents/databricks.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,342 @@
.. _deployment-agent-setup-databricks:

Databricks Agent
=================

This guide provides an overview of how to set up Databricks agent in your Flyte deployment.

Spin up a cluster
-----------------

.. tabs::

.. group-tab:: Flyte binary

You can spin up a demo cluster using the following command:

.. code-block:: bash

flytectl demo start

Or install Flyte using the :ref:`flyte-binary helm chart <deployment-deployment-cloud-simple>`.

.. group-tab:: Flyte core

If you've installed Flyte using the
`flyte-core helm chart <https://github.com/flyteorg/flyte/tree/master/charts/flyte-core>`__, please ensure:

* You have the correct kubeconfig and have selected the correct Kubernetes context.
* You have configured the correct flytectl settings in ``~/.flyte/config.yaml``.

.. note::

Add the Flyte chart repo to Helm if you're installing via the Helm charts.

.. code-block:: bash

helm repo add flyteorg https://flyteorg.github.io/flyte

Databricks workspace
--------------------

To set up your Databricks account, follow these steps:

1. Create a `Databricks account <https://www.databricks.com/>`__.
2. Ensure that you have a Databricks workspace up and running.
3. Generate a `personal access token
<https://docs.databricks.com/dev-tools/auth.html#databricks-personal-ACCESS_TOKEN-authentication>`__ to be used in the Flyte configuration.
You can find the personal access token in the user settings within the workspace.

.. note::

When testing the Databricks plugin on the demo cluster, create an S3 bucket because the local demo
cluster utilizes MinIO. Follow the `AWS instructions
<https://docs.aws.amazon.com/powershell/latest/userguide/pstools-appendix-sign-up.html>`__
to generate access and secret keys, which can be used to access your preferred S3 bucket.

Create an `instance profile
<https://docs.databricks.com/administration-guide/cloud-configurations/aws/instance-profiles.html>`__
for the Spark cluster. This profile enables the Spark job to access your data in the S3 bucket.
Please follow all four steps specified in the documentation.

Upload the following entrypoint.py file to either
`DBFS <https://docs.databricks.com/archive/legacy/data-tab.html>`__
(the final path can be ``dbfs:///FileStore/tables/entrypoint.py``) or S3.
This file will be executed by the Spark driver node, overriding the default command in the
`dbx <https://docs.databricks.com/dev-tools/dbx.html>`__ job.

.. TODO: A quick-and-dirty workaround to resolve https://github.com/flyteorg/flyte/issues/3853 issue is to import pandas.

.. code-block:: python

import os
import sys
from typing import List

import click
import pandas
from flytekit.bin.entrypoint import fast_execute_task_cmd as _fast_execute_task_cmd
from flytekit.bin.entrypoint import execute_task_cmd as _execute_task_cmd
from flytekit.exceptions.user import FlyteUserException
from flytekit.tools.fast_registration import download_distribution


def fast_execute_task_cmd(additional_distribution: str, dest_dir: str, task_execute_cmd: List[str]):
if additional_distribution is not None:
if not dest_dir:
dest_dir = os.getcwd()
download_distribution(additional_distribution, dest_dir)

# Insert the call to fast before the unbounded resolver args
cmd = []
for arg in task_execute_cmd:
if arg == "--resolver":
cmd.extend(["--dynamic-addl-distro", additional_distribution, "--dynamic-dest-dir", dest_dir])
cmd.append(arg)

click_ctx = click.Context(click.Command("dummy"))
parser = _execute_task_cmd.make_parser(click_ctx)
args, _, _ = parser.parse_args(cmd[1:])
_execute_task_cmd.callback(test=False, **args)


def main():

args = sys.argv

click_ctx = click.Context(click.Command("dummy"))
if args[1] == "pyflyte-fast-execute":
parser = _fast_execute_task_cmd.make_parser(click_ctx)
args, _, _ = parser.parse_args(args[2:])
fast_execute_task_cmd(**args)
elif args[1] == "pyflyte-execute":
parser = _execute_task_cmd.make_parser(click_ctx)
args, _, _ = parser.parse_args(args[2:])
_execute_task_cmd.callback(test=False, dynamic_addl_distro=None, dynamic_dest_dir=None, **args)
else:
raise FlyteUserException(f"Unrecognized command: {args[1:]}")


if __name__ == '__main__':
main()

Specify agent configuration
----------------------------

.. tabs::

.. group-tab:: Flyte binary

.. tabs::

.. group-tab:: Demo cluster

Enable the Databricks agent on the demo cluster by adding the following config to ``~/.flyte/sandbox/config.yaml``:

.. code-block:: yaml
:emphasize-lines: 7,12

tasks:
task-plugins:
default-for-task-types:
container: container
container_array: k8s-array
sidecar: sidecar
spark: agent-service
enabled-plugins:
- container
- sidecar
- k8s-array
- agent-service
plugins:
databricks:
entrypointFile: dbfs:///FileStore/tables/entrypoint.py
databricksInstance: <DATABRICKS_ACCOUNT>.cloud.databricks.com
pingsutw marked this conversation as resolved.
Show resolved Hide resolved
k8s:
default-env-vars:
- FLYTE_AWS_ACCESS_KEY_ID: <AWS_ACCESS_KEY_ID>
- FLYTE_AWS_SECRET_ACCESS_KEY: <AWS_SECRET_ACCESS_KEY>
- AWS_DEFAULT_REGION: <AWS_REGION>
remoteData:
region: <AWS_REGION>
scheme: aws
signedUrls:
durationMinutes: 3
propeller:
rawoutput-prefix: s3://<S3_BUCKET_NAME>/
storage:
container: "<S3_BUCKET_NAME>"
type: s3
stow:
kind: s3
config:
region: <AWS_REGION>
disable_ssl: true
v2_signing: false
auth_type: accesskey
access_key_id: <AWS_ACCESS_KEY_ID>
secret_key: <AWS_SECRET_ACCESS_KEY>
signedURL:
stowConfigOverride:
endpoint: ""
pingsutw marked this conversation as resolved.
Show resolved Hide resolved

Substitute ``<DATABRICKS_ACCOUNT>`` with the name of your Databricks account,
``<AWS_REGION>`` with the region where you created your AWS bucket,
``<AWS_ACCESS_KEY_ID>`` with your AWS access key ID,
``<AWS_SECRET_ACCESS_KEY>`` with your AWS secret access key,
and ``<S3_BUCKET_NAME>`` with the name of your S3 bucket.

.. group-tab:: Helm chart

Edit the relevant YAML file to specify the plugin.

.. code-block:: yaml
:emphasize-lines: 7,11

tasks:
task-plugins:
enabled-plugins:
- container
- sidecar
- k8s-array
- agent-service
default-for-task-types:
- container: container
- container_array: k8s-array
- spark: agent-service

.. code-block:: yaml
:emphasize-lines: 3-5

inline:
plugins:
databricks:
entrypointFile: dbfs:///FileStore/tables/entrypoint.py
databricksInstance: <DATABRICKS_ACCOUNT>.cloud.databricks.com

Substitute ``<DATABRICKS_ACCOUNT>`` with the name of your Databricks account.

.. group-tab:: Flyte core

Create a file named ``values-override.yaml`` and add the following config to it:

.. code-block:: yaml
:emphasize-lines: 9,14,15-21

configmap:
enabled_plugins:
tasks:
task-plugins:
enabled-plugins:
- container
- sidecar
- k8s-array
- agent-service
default-for-task-types:
container: container
sidecar: sidecar
container_array: k8s-array
spark: agent-service
databricks:
enabled: True
plugin_config:
plugins:
databricks:
entrypointFile: dbfs:///FileStore/tables/entrypoint.py
databricksInstance: <DATABRICKS_ACCOUNT>.cloud.databricks.com

Substitute ``<DATABRICKS_ACCOUNT>`` with the name of your Databricks account.

Add the Databricks access token
-------------------------------

You have to set the Databricks token to the Flyte configuration.

1. Install flyteagent pod using helm

.. code-block:: bash

cd flyte/charts/flyteagent

helm install flyteagent . -n flyte


2. Get the base64 value of your Databricks token.

.. code-block:: bash

echo -n "<DATABRICKS_TOKEN>" | base64

3. Edit the flyteagent secret

.. code-block:: bash

kubectl edit secret flyteagent -n flyte

.. code-block:: yaml
:emphasize-lines: 3

apiVersion: v1
data:
flyte_databricks_access_token: <BASE64_ENCODED_DATABRICKS_TOKEN>
username: User
kind: Secret
metadata:
annotations:
meta.helm.sh/release-name: flyteagent
meta.helm.sh/release-namespace: flyte
creationTimestamp: "2023-10-04T04:09:03Z"
labels:
app.kubernetes.io/managed-by: Helm
name: flyteagent
namespace: flyte
resourceVersion: "753"
uid: 5ac1e1b6-2a4c-4e26-9001-d4ba72c39e54
type: Opaque


Upgrade the deployment
----------------------

.. tabs::

.. group-tab:: Flyte binary

.. tabs::

.. group-tab:: Demo cluster

.. code-block:: bash

kubectl rollout restart deployment flyte-sandbox -n flyte

.. group-tab:: Helm chart

.. code-block:: bash

helm upgrade <RELEASE_NAME> flyteorg/flyte-binary -n <YOUR_NAMESPACE> --values <YOUR_YAML_FILE>

Replace ``<RELEASE_NAME>`` with the name of your release (e.g., ``flyte-backend``),
``<YOUR_NAMESPACE>`` with the name of your namespace (e.g., ``flyte``),
and ``<YOUR_YAML_FILE>`` with the name of your YAML file.

.. group-tab:: Flyte core

.. code-block::

helm upgrade <RELEASE_NAME> flyte/flyte-core -n <YOUR_NAMESPACE> --values values-override.yaml

Replace ``<RELEASE_NAME>`` with the name of your release (e.g., ``flyte``)
and ``<YOUR_NAMESPACE>`` with the name of your namespace (e.g., ``flyte``).

Wait for the upgrade to complete. You can check the status of the deployment pods by running the following command:

.. code-block::

kubectl get pods -n flyte

.. note::

Make sure you enable `custom containers
<https://docs.databricks.com/administration-guide/clusters/container-services.html>`__
on your Databricks cluster before you trigger the workflow.
11 changes: 11 additions & 0 deletions rsts/deployment/agents/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,21 @@ Discover the process of setting up Agents for Flyte.
Guide to setting up the MMCloud agent.


---

.. link-button:: deployment-agent-setup-databricks
:type: ref
:text: Databricks Agent
:classes: btn-block stretched-link
^^^^^^^^^^^^
Guide to setting up the Databricks agent.

.. toctree::
:maxdepth: 1
:name: Agent setup
:hidden:

bigquery
mmcloud
databricks
=======
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved
Loading