Skip to content

Commit

Permalink
reorganize sqlite3 user guide example (flyteorg#300)
Browse files Browse the repository at this point in the history
* reorganize sqlite3 user guide example

move from extending_flyte to integrations/flytekit_plugins

Signed-off-by: cosmicBboy <[email protected]>

* update title

Signed-off-by: cosmicBboy <[email protected]>

* update dolt card text

Signed-off-by: cosmicBboy <[email protected]>

* add sql-alchemy

Signed-off-by: Samhita Alla <[email protected]>

* readme

Signed-off-by: Samhita Alla <[email protected]>

* lint code

Signed-off-by: Samhita Alla <[email protected]>

* modify sqlalchemy

Signed-off-by: Samhita Alla <[email protected]>

* update content

Signed-off-by: Samhita Alla <[email protected]>

* update example

Signed-off-by: Samhita Alla <[email protected]>

* sqlalchemy remote example

Signed-off-by: Samhita Alla <[email protected]>

* code updates

Signed-off-by: Samhita Alla <[email protected]>

Co-authored-by: Samhita Alla <[email protected]>
  • Loading branch information
cosmicBboy and samhita-alla authored Jul 20, 2021
1 parent d6bc023 commit 52b63dc
Show file tree
Hide file tree
Showing 14 changed files with 406 additions and 31 deletions.
6 changes: 2 additions & 4 deletions cookbook/docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,9 +236,8 @@ def __call__(self, filename):
"../core/containerization",
"../deployment",
# "../control_plane", # TODO: add content to this section
# "../integrations/flytekit_plugins/sqllite3", # TODO: add content to this section
"../integrations/flytekit_plugins/sql",
"../integrations/flytekit_plugins/papermilltasks",
# "../integrations/flytekit_plugins/sqlalchemy", # TODO: add content to this section
"../integrations/flytekit_plugins/pandera",
"../integrations/flytekit_plugins/dolt",
"../integrations/kubernetes/pod",
Expand All @@ -264,9 +263,8 @@ def __call__(self, filename):
"auto/deployment",
# "auto/deployment/guides", # TODO: add content to this section
# "auto/control_plane", # TODO: add content to this section
# "auto/integrations/flytekit_plugins/sqllite3", # TODO: add content to this section
"auto/integrations/flytekit_plugins/sql",
"auto/integrations/flytekit_plugins/papermilltasks",
# "auto/integrations/flytekit_plugins/sqlalchemy", # TODO: add content to this section
"auto/integrations/flytekit_plugins/pandera",
"auto/integrations/flytekit_plugins/dolt",
"auto/integrations/kubernetes/pod",
Expand Down
21 changes: 18 additions & 3 deletions cookbook/docs/flytekit_plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,15 @@ You can find the plugins maintained by the core flyte team `here <https://github
.. panels::
:header: text-center

.. link-button:: auto/integrations/flytekit_plugins/sql/index
:type: ref
:text: SQL
:classes: btn-block stretched-link
^^^^^^^^^^^^
Execute SQL queries as tasks.

---

.. link-button:: auto/integrations/flytekit_plugins/papermilltasks/index
:type: ref
:text: Papermill
Expand All @@ -34,16 +43,22 @@ You can find the plugins maintained by the core flyte team `here <https://github
^^^^^^^^^^^^
Validate pandas dataframes with ``pandera``.

---

.. link-button:: auto/integrations/flytekit_plugins/dolt/index
:type: ref
:text: Dolt
:classes: btn-block stretched-link
^^^^^^^^^^^^
Version your SQL database with ``dolt``.

.. TODO: add the following items to the TOC when the content is written.
.. - auto/integrations/flytekit_plugins/sqllite3/index
.. - auto/integrations/flytekit_plugins/sqlalchemy/index

.. toctree::
:maxdepth: -1
:caption: Contents
:hidden:

auto/integrations/flytekit_plugins/sql/index
auto/integrations/flytekit_plugins/papermilltasks/index
auto/integrations/flytekit_plugins/pandera/index
auto/integrations/flytekit_plugins/dolt/index
31 changes: 31 additions & 0 deletions cookbook/integrations/flytekit_plugins/sql/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
FROM python:3.8-buster

WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root

# Install the AWS cli separately to prevent issues with boto being written over
RUN pip3 install awscli

ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"

# Install Python dependencies
COPY sql/requirements.txt /root/.
RUN pip install -r /root/requirements.txt

# Copy the makefile targets to expose on the container. This makes it easier to register.
COPY in_container.mk /root/Makefile
COPY sql/sandbox.config /root

# Copy the actual code
COPY sql/ /root/sql/

# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
3 changes: 3 additions & 0 deletions cookbook/integrations/flytekit_plugins/sql/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
PREFIX=sql
include ../../../common/Makefile
include ../../../common/leaf.mk
7 changes: 7 additions & 0 deletions cookbook/integrations/flytekit_plugins/sql/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
###
SQL
###

Flyte tasks are not always restricted to running user-supplied containers, nor even containers at all. Indeed, this is
one of the most important design decisions in Flyte. Non-container tasks can have arbitrary targets for execution --
an API that executes SQL queries like SnowFlake, BigQuery, a synchronous WebAPI, etc.
Empty file.
2 changes: 2 additions & 0 deletions cookbook/integrations/flytekit_plugins/sql/requirements.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
-r ../../../common/requirements-common.in
flytekitplugins-sqlalchemy>=0.20.1
146 changes: 146 additions & 0 deletions cookbook/integrations/flytekit_plugins/sql/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
#
# This file is autogenerated by pip-compile with python 3.8
# To update, run:
#
# /Library/Developer/CommandLineTools/usr/bin/make requirements.txt
#
attrs==21.2.0
# via scantree
certifi==2021.5.30
# via requests
charset-normalizer==2.0.2
# via requests
click==7.1.2
# via flytekit
croniter==1.0.15
# via flytekit
cycler==0.10.0
# via matplotlib
dataclasses-json==0.5.4
# via flytekit
decorator==5.0.9
# via retry
deprecated==1.2.12
# via flytekit
dirhash==0.2.1
# via flytekit
docker-image-py==0.1.10
# via flytekit
flyteidl==0.19.13
# via flytekit
flytekit==0.20.1
# via
# -r ../../../common/requirements-common.in
# flytekitplugins-sqlalchemy
flytekitplugins-sqlalchemy==0.20.1
# via -r requirements.in
greenlet==1.1.0
# via sqlalchemy
grpcio==1.38.1
# via flytekit
idna==3.2
# via requests
importlib-metadata==4.6.1
# via keyring
keyring==23.0.1
# via flytekit
kiwisolver==1.3.1
# via matplotlib
marshmallow==3.12.2
# via
# dataclasses-json
# marshmallow-enum
# marshmallow-jsonschema
marshmallow-enum==1.5.1
# via dataclasses-json
marshmallow-jsonschema==0.12.0
# via flytekit
matplotlib==3.4.2
# via -r ../../../common/requirements-common.in
mypy-extensions==0.4.3
# via typing-inspect
natsort==7.1.1
# via flytekit
numpy==1.21.0
# via
# matplotlib
# pandas
# pyarrow
pandas==1.3.0
# via flytekit
pathspec==0.8.1
# via scantree
pillow==8.3.1
# via matplotlib
protobuf==3.17.3
# via
# flyteidl
# flytekit
py==1.10.0
# via retry
pyarrow==3.0.0
# via flytekit
pyparsing==2.4.7
# via matplotlib
python-dateutil==2.8.1
# via
# croniter
# flytekit
# matplotlib
# pandas
python-json-logger==2.0.1
# via flytekit
pytimeparse==1.1.8
# via flytekit
pytz==2018.4
# via
# flytekit
# pandas
regex==2021.7.6
# via docker-image-py
requests==2.26.0
# via
# flytekit
# responses
responses==0.13.3
# via flytekit
retry==0.9.2
# via flytekit
scantree==0.0.1
# via dirhash
six==1.16.0
# via
# cycler
# flytekit
# grpcio
# protobuf
# python-dateutil
# responses
# scantree
sortedcontainers==2.4.0
# via flytekit
sqlalchemy==1.4.21
# via flytekitplugins-sqlalchemy
statsd==3.3.0
# via flytekit
stringcase==1.2.0
# via dataclasses-json
typing-extensions==3.10.0.0
# via typing-inspect
typing-inspect==0.7.1
# via dataclasses-json
urllib3==1.26.6
# via
# flytekit
# requests
# responses
wheel==0.36.2
# via
# -r ../../../common/requirements-common.in
# flytekit
wrapt==1.12.1
# via
# deprecated
# flytekit
zipp==3.5.0
# via importlib-metadata
3 changes: 3 additions & 0 deletions cookbook/integrations/flytekit_plugins/sql/sandbox.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[sdk]
workflow_packages=sql
python_venv=flytekit_venv
45 changes: 45 additions & 0 deletions cookbook/integrations/flytekit_plugins/sql/sql-alchemy-remote.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
"""
SQLAlchemy
----------
SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.
That being said, Flyte provides an easy-to-use interface to utilize SQLAlchemy to connect to various SQL Databases.
The SQLAlchemy task will run with a pre-built container, and thus users needn't build one.
"""

# %%
# Let's import the libraries.
import pandas
from flytekit import kwtypes, task, workflow
from flytekitplugins.sqlalchemy import SQLAlchemyConfig, SQLAlchemyTask


# %%
# We define an SQLAlchemyTask to fetch limited records from a table. Finally, we return the length of the returned DataFrame.
#
# .. note::
#
# The output of SQLAlchemyTask is a :py:class:`~flytekit.types.schema.FlyteSchema` by default.
@task
def get_length(df: pandas.DataFrame) -> int:
return len(df)


sql_task = SQLAlchemyTask(
name="sqlalchemy_task",
query_template="select * from <table> limit {{.inputs.limit}}",
inputs=kwtypes(limit=int),
task_config=SQLAlchemyConfig(uri="<uri>"),
)


@workflow
def my_wf(limit: int) -> int:
return get_length(df=sql_task(limit=limit))


if __name__ == "__main__":
print(f"Running {__file__} main...")
print(my_wf(limit=3))
Loading

0 comments on commit 52b63dc

Please sign in to comment.