-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Databricks plugin #3142
Merged
Merged
Databricks plugin #3142
Changes from 8 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
70278b2
Databricks plugin
pingsutw 9c1eb06
add databricks page
pingsutw d7b25d3
update
pingsutw 05d09c8
update
pingsutw fd34ff1
merged master
pingsutw 4dc6f4c
update entrypoint
pingsutw 2919763
update helm
pingsutw d1f1bf9
nit
pingsutw e044895
Merge remote-tracking branch 'origin/master' into databricks
wild-endeavor ac4b388
Add gist
pingsutw 87e9fd2
nit
pingsutw f004f09
nit
pingsutw File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -107,6 +107,7 @@ helm install gateway bitnami/contour -n flyte | |
| configmap.task_logs.plugins.logs.cloudwatch-enabled | bool | `false` | One option is to enable cloudwatch logging for EKS, update the region and log group accordingly | | ||
| configmap.task_resource_defaults | object | `{"task_resources":{"defaults":{"cpu":"100m","memory":"500Mi","storage":"500Mi"},"limits":{"cpu":2,"gpu":1,"memory":"1Gi","storage":"20Mi"}}}` | Task default resources configuration Refer to the full [structure](https://pkg.go.dev/github.com/lyft/[email protected]/pkg/runtime/interfaces#TaskResourceConfiguration). | | ||
| configmap.task_resource_defaults.task_resources | object | `{"defaults":{"cpu":"100m","memory":"500Mi","storage":"500Mi"},"limits":{"cpu":2,"gpu":1,"memory":"1Gi","storage":"20Mi"}}` | Task default resources parameters | | ||
| databricks | object | `{"enabled":false,"plugin_config":{"plugins":{"databricks":{"databricksInstance":"dbc-a53b7a3c-614c","entrypointFile":"dbfs:///FileStore/tables/entrypoint.py"}}}}` | Optional: Databricks Plugin allows us to run the spark job on the Databricks platform. | | ||
| datacatalog.affinity | object | `{}` | affinity for Datacatalog deployment | | ||
| datacatalog.configPath | string | `"/etc/datacatalog/config/*.yaml"` | Default regex string for searching configuration files | | ||
| datacatalog.enabled | bool | `true` | | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
.. _deployment-plugin-setup-webapi-databricks: | ||
|
||
Databricks Plugin Setup | ||
----------------------- | ||
|
||
This guide gives an overview of how to set up Databricks in your Flyte deployment. | ||
|
||
1. Add Flyte chart repo to Helm | ||
|
||
.. code-block:: | ||
|
||
helm repo add flyteorg https://flyteorg.github.io/flyte | ||
|
||
|
||
2. Setup the cluster | ||
|
||
.. tabbed:: Sandbox | ||
|
||
* Start the sandbox cluster | ||
|
||
.. code-block:: bash | ||
|
||
flytectl sandbox start | ||
|
||
* Generate Flytectl sandbox config | ||
|
||
.. code-block:: bash | ||
|
||
flytectl config init | ||
|
||
.. tabbed:: AWS/GCP | ||
|
||
* Make sure you have up and running flyte cluster in `AWS <https://docs.flyte.org/en/latest/deployment/aws/index.html#deployment-aws>`__ / `GCP <https://docs.flyte.org/en/latest/deployment/gcp/index.html#deployment-gcp>`__ | ||
* Make sure you have correct kubeconfig and selected the correct kubernetes context | ||
* make sure you have the correct flytectl config at ~/.flyte/config.yaml | ||
|
||
3. Upload an ``entrypoint.py`` to dbfs or s3. Spark driver node run this file to override the default command in the dbx job. | ||
|
||
.. code-block:: python | ||
|
||
# entrypoint.py | ||
import os | ||
import sys | ||
from typing import List | ||
|
||
import click | ||
from flytekit.bin.entrypoint import fast_execute_task_cmd as _fast_execute_task_cmd | ||
from flytekit.bin.entrypoint import execute_task_cmd as _execute_task_cmd | ||
from flytekit.exceptions.user import FlyteUserException | ||
from flytekit.tools.fast_registration import download_distribution | ||
|
||
|
||
def fast_execute_task_cmd(additional_distribution: str, dest_dir: str, task_execute_cmd: List[str]): | ||
if additional_distribution is not None: | ||
if not dest_dir: | ||
dest_dir = os.getcwd() | ||
download_distribution(additional_distribution, dest_dir) | ||
|
||
# Insert the call to fast before the unbounded resolver args | ||
cmd = [] | ||
for arg in task_execute_cmd: | ||
if arg == "--resolver": | ||
cmd.extend(["--dynamic-addl-distro", additional_distribution, "--dynamic-dest-dir", dest_dir]) | ||
cmd.append(arg) | ||
|
||
click_ctx = click.Context(click.Command("dummy")) | ||
parser = _execute_task_cmd.make_parser(click_ctx) | ||
args, _, _ = parser.parse_args(cmd[1:]) | ||
_execute_task_cmd.callback(**args) | ||
|
||
|
||
def main(): | ||
|
||
args = sys.argv | ||
|
||
click_ctx = click.Context(click.Command("dummy")) | ||
if args[1] == "pyflyte-fast-execute": | ||
parser = _fast_execute_task_cmd.make_parser(click_ctx) | ||
args, _, _ = parser.parse_args(args[2:]) | ||
fast_execute_task_cmd(**args) | ||
elif args[1] == "pyflyte-execute": | ||
parser = _execute_task_cmd.make_parser(click_ctx) | ||
args, _, _ = parser.parse_args(args[2:]) | ||
_execute_task_cmd.callback(**args) | ||
else: | ||
raise FlyteUserException(f"Unrecognized command: {args[1:]}") | ||
|
||
|
||
if __name__ == '__main__': | ||
main() | ||
|
||
|
||
|
||
4. Create a file named ``values-override.yaml`` and add the following config to it: | ||
|
||
.. code-block:: yaml | ||
|
||
configmap: | ||
enabled_plugins: | ||
# -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) | ||
tasks: | ||
# -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) | ||
task-plugins: | ||
# -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable sagemaker*, athena if you install the backend | ||
# plugins | ||
enabled-plugins: | ||
- container | ||
- sidecar | ||
- k8s-array | ||
- databricks | ||
default-for-task-types: | ||
container: container | ||
sidecar: sidecar | ||
container_array: k8s-array | ||
spark: databricks | ||
databricks: | ||
enabled: True | ||
plugin_config: | ||
plugins: | ||
databricks: | ||
entrypointFile: dbfs:///FileStore/tables/entrypoint-4.py | ||
databricksInstance: dbc-a53b7a3c-614c | ||
|
||
5. Create a Databricks account and follow the docs for creating an Access token. | ||
|
||
6. Create a `Instance Profile <https://docs.databricks.com/administration-guide/cloud-configurations/aws/instance-profiles.html>`_ for the Spark cluster, it allows the spark job to access your data in the s3 bucket. | ||
|
||
7. Add Databricks access token to FlytePropeller. | ||
|
||
.. note:: | ||
Refer to the `access token <https://docs.databricks.com/dev-tools/auth.html#databricks-personal-access-tokens>`__ to understand setting up the Databricks access token. | ||
|
||
.. code-block:: bash | ||
|
||
kubectl edit secret -n flyte flyte-secret-auth | ||
|
||
The configuration will look as follows: | ||
|
||
.. code-block:: yaml | ||
|
||
apiVersion: v1 | ||
data: | ||
FLYTE_DATABRICKS_API_TOKEN: <ACCESS_TOKEN> | ||
client_secret: Zm9vYmFy | ||
kind: Secret | ||
metadata: | ||
annotations: | ||
meta.helm.sh/release-name: flyte | ||
meta.helm.sh/release-namespace: flyte | ||
... | ||
|
||
Replace ``<ACCESS_TOKEN>`` with your access token. | ||
|
||
8. Upgrade the Flyte Helm release. | ||
|
||
.. code-block:: bash | ||
|
||
helm upgrade -n flyte -f https://raw.githubusercontent.com/flyteorg/flyte/master/charts/flyte-core/values-sandbox.yaml -f values-override.yaml flyteorg/flyte-core |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there an example of this file?