-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run databricks task locally #1951
Conversation
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1951 +/- ##
==========================================
- Coverage 85.91% 85.90% -0.02%
==========================================
Files 306 306
Lines 22818 22867 +49
Branches 3466 3470 +4
==========================================
+ Hits 19605 19644 +39
- Misses 2622 2629 +7
- Partials 591 594 +3 ☔ View full report in Codecov by Sentry. |
Here's how I try to specify the path. if __name__ == '__main__':
runner = CliRunner()
result = runner.invoke(pyflyte.main,
["run",
"--raw-output-data-prefix",
"s3://flyte-batch/spark/",
"/mnt/c/code/dev/example/plugins/databricks_wf",
"wf"])
print(result.output) Can you explain how to set the Environment variable: # for flyte s3 minio
export FLYTE_AWS_ENDPOINT="http://localhost:30080/"
export FLYTE_AWS_ACCESS_KEY_ID="minio"
export FLYTE_AWS_SECRET_ACCESS_KEY="miniostorage" Error Message: (dev) root@googler:/mnt/c/code/dev/example/plugins# python databricks_wf.py
Running Execution on local.
Failed with Exception Code: USER:AssertionError
Underlying Exception: Not Found
Failed to put data from /tmp/tmpbfgzko5e/script_mode.tar.gz to s3://flyte-batch/spark/025c7d20ac403c3c26629b35c0bca000/script_mode.tar.gz (recursive=False).
Original exception: Not Found |
How to SetupLet's say you have a python file called 0. Databricks SETUP(0) Setup your workspace (1) Enable BYOC (bring your own container) curl -X PATCH -n \
-H "Authorization: Bearer <your-personal-access-token>" \
https://<databricks-instance>/api/2.0/workspace-conf \
-d '{
"enableDcs": "true"
}' Note: remeber to use the token, it doesn't be written in the docs You can browse your dbfs in (3) instance profile 1. Build your Dockerfile (Will support ImageSpec in the future)FROM databricksruntime/standard:13.3-LTS
LABEL org.opencontainers.image.source=https://github.com/flyteorg/flytesnacks
ENV PYTHONPATH /databricks/driver
ENV PATH="/databricks/python3/bin:$PATH"
USER 0
RUN sudo apt-get update && sudo apt-get install -y make build-essential libssl-dev git
RUN /databricks/python3/bin/pip install git+https://github.com/Future-Outlier/flytekit.git@master#subdirectory=plugins/flytekit-spark
RUN /databricks/python3/bin/pip install markupsafe==2.0.0
COPY flyte-example/databricks_wf.py /databricks/driver/
WORKDIR /databricks/driver
ENV PYTHONPATH /databricks/driver docker built -t pingsutw/databricks:v7 . Note: you have to put your 2. Run the codeLocally(Wait for Kevin's reply) RemotelyYou can use pyflyte register databricks_wf.py --version DB-FIRST pyflyte register --non-fast databricks_wf.py --version DB-SECOND Now, you can run it! |
Is the |
This is the command to run it locally. pyflyte --verbose run --raw-output-data-prefix s3://flyte-batch/spark/ flyte-example/databricks_wf.py wf |
Signed-off-by: Kevin Su <[email protected]>
I thought about it some-more, after delibration i think this is confusing.
class AgentFunctionTaskExecutor():
...
def execute():
if ctx.raw_output_prefix is local:
raise AssertionError("Using agent {self.name} locally needs to have a way to pass the data/code from local to remote. This needs the configuration of a common shared blob store like S3, gcs etc. This can be achieved using `--raw-output-prefix` in `pyflyte run`. If you want to run the task code locally without invoking the remote service (e.g. testing) use `--local-agent-emulation` flag in `pyflyte run`
... continue to execution
|
def execute(self, **kwargs) -> Any: | ||
if isinstance(self.task_config, Databricks): | ||
# Since we only have databricks agent | ||
return AsyncAgentExecutorMixin.execute(self, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this also automatically invoke the local method?
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
also i can try and test this?
Signed-off-by: Kevin Su <[email protected]> Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Kevin Su <[email protected]> Signed-off-by: Rafael Raposo <[email protected]>
TL;DR
This PR allows to submit databricks job from local, and save intermediate data in the blob store. It simplifies the process of testing and developing Databricks tasks locally.
Two ways to run the databricks job locally.
pyflyte run databricks.py wf
- Run a spark task in the local processpyflyte run --raw-output-data-prefix s3://databricks-agent/demo databricks.py wf
- submit to databricks platform. Fall back to 1 (local execution) if agent raises an exception.Note: To submit job from local, you need AWS credential and Databricks access key in the environment variable.
Type
Are all requirements met?
Complete description
Tracking Issue
flyteorg/flyte#3936
Follow-up issue
NA