Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Extend Flyte doc with a flowchart #763

Merged
merged 13 commits into from
Feb 22, 2021
10 changes: 9 additions & 1 deletion rsts/howto/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ How Do I...?
productionize/index
launchplans
managing_customizable_resources
plugins/index
enable_and_use_schedules
enable_backend_plugin
monitoring/index
Expand All @@ -32,3 +31,12 @@ How Do I...?
labels_annotations
notifications
serviceaccount


.. _howto_extend:

=======================
How do I Extend flyte?
=======================

Flyte was designed to be extensible. The section - :ref:`plugins`- dives into more details on extending Flyte and also provides available examples.
5 changes: 0 additions & 5 deletions rsts/howto/plugins/index.rst

This file was deleted.

5 changes: 5 additions & 0 deletions rsts/plugins/extend/flyte_backend.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.. _extend-plugin-flyte-backend:

########################################
Implement Backend Extensions (advanced)
########################################
5 changes: 5 additions & 0 deletions rsts/plugins/extend/flytekit_python.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.. _extend-plugin-flytekit-python:

##################################
Extend flytekit (python)
##################################
126 changes: 126 additions & 0 deletions rsts/plugins/extend/intro.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
.. _plugins_extend_intro:

###########################
When & How to Extend Flyte
###########################

The Core of Flyte is a container execution engine, where you can write one or more tasks and string them together to form a data dependency DAG - called a ``workflow``.
If your work involves writing simple python or java tasks that can either perform operations on their own or can call out to external services - then there is ``NO NEED to extend FLYTE``.
kumare3 marked this conversation as resolved.
Show resolved Hide resolved

But, in that case you can almost do everything using python / java or a container - So Why should you even have to extend Flyte?
kumare3 marked this conversation as resolved.
Show resolved Hide resolved

=================
But First - Why?
=================

Case 1: I want to use my special Types - e.g. my own DataFrame format
==========================================================================
Flyte just like a programming language has a core type-system, but just like most languages, this type system can be extended, but allowing users to add ``User defined Data types``.
kumare3 marked this conversation as resolved.
Show resolved Hide resolved
A User defined data type can be something that Flyte does not really understand, but is extremely useful for a users specific needs. For example it can be a Custom user structure or a grouping of images in a specific encoding.
kumare3 marked this conversation as resolved.
Show resolved Hide resolved

Flytekit natively supports handling of structured data like User defined structures like DataClasses using JSON as the representation format. An example of this is available in FlyteCookbook - :std:doc:`auto_core_intermediate/custom_objects`.

For types that are not simply representable as JSON documents, Flytekit allows users to extends Flyte's type system and implement these types in Python. The user has to essentially implement a :py:class:`flytekit.extend.TypeTransformer` class to enable translation of the type from Users type to flyte understood types. As an example,
instead of using :py:class:`pandas.DataFrame` directly, you may want to use `Pandera <https://pandera.readthedocs.io/en/stable/>`_ to perform validation of an input or output dataframe. an example can be found `here <https://github.com/flyteorg/flytekit/blob/master/plugins/tests/pandera/test_wf.py#L9>`_.

To extend the type system in flytekit refer to an illustrative example found at - :std:ref:`advanced_custom_types`.


Case 2: Add a new Task Type - Flyte capability
===============================================
So often times you want to interact with a service like,

- a Database (Postgres, MySQL, etc)
- a DataWarehouse like (Snowflake, BigQuery, Redshift etc)
- a computation platform like (AWS EMR, Databricks etc)

and you want this to be available like a template for all other users - open source or within your organization. This can be done by creating a task plugin.
A Task-plugin makes it possible for you or other users to use your idea natively within Flyte as it this capability was built into the flyte platform.
kumare3 marked this conversation as resolved.
Show resolved Hide resolved

Thus for example, if you want users to write code simply using the ``@task`` decorator, but you want to provide a capability of running the function as a spark job or a sagemaker training job - then you can extend Flyte's task system - we will refer to this as the plugin and it could be possible to do the following
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one point of feedback we've received is that task_config is confusing as a name. Can we call out explicitly that task plugins are implemented with custom task configs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katrogan I dont know if the ship has sailed, but what would you call it rather. this is configuration right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i guess configuration just has multiple interpretations and perhaps this doesn't really belong in how to create plugins anyways. i can add a PR for going over that after this one


.. code-block:: python

@task(task_config=MyContainerExecutionTask(
plugin_specific_config_a=...,
plugin_specific_config_b=...,
...
))
def foo(...) -> ...:
...


OR provide an interface like this

.. code-block:: python

query_task = SnowflakeQuery(query="Select * from x where x.time < {{.inputs.time}}", inputs=(time=datetime), results=pandas.DataFrame)

@workflow
def my_wf(t: datetime) -> ...:
df = query_task(time=t)
return process(df=df)



===========================================================
I want to write a Task Plugin or add a new TaskType
===========================================================

Interestingly there are 2 options here. You can write a task plugin simply as an extension in flytekit, or you can go deeper and write a Plugin in Flyte backend itself.
kumare3 marked this conversation as resolved.
Show resolved Hide resolved

Flytekit only plugin
======================
An illustrative example of writing a flytekit plugin can be found at - :std:ref:`advanced_custom_task_plugin`. Flytekit plugins are simple to write and should invariably be
the first place you start at. Here

**Pros**

#. Simple to write, just implement in python. Flyte will treat it like a container execution and blindly pass control to the plugin
#. Simple to publish - flytekitplugins can be published as independent libraries and they follow a simple api.
#. Simple to perform testing - just test locally in flytekit

**Cons**

#. Limited ways of providing additional visibility in progress, or external links etc
#. Has to be implemented again in every language as these are SDK side plugins only
#. In case of side-effects, potentially of causing resource leaks. For example if the plugins runs a BigQuery Job, it is possible that the plugin may crash after running the Job and Flyte cannot guarantee that the BigQuery job wil be successfully terminated.
#. Potentially expensive - In cases where the plugin just runs a remote job - e.g how Airflow does, then running a new pod for every task execution causes severe strain on k8s and the task itself uses almost no CPUs. Also because of stateful natute, using spot-instances is not trivial.
#. A bug fix to the runtime, needs a new library version of the plugin
#. Not trivial to implement resource controls - e.g. throttling, resource pooling etc

Backend Plugin
===============

Doc on how to writed a backend plugins is coming soon. A backend plugin essentially makes it possible for users to write extensions for FlytePropeller (Flytes scheduling engine). This enables complete control on the visualization and availability of the plugin.

**Pros**

#. Service oriented way of deploying new plugins - strong contracts. Maintaners can deploy new versions of the backend plugin, fix bugs, without needing the users to upgrade Libraries etc
kumare3 marked this conversation as resolved.
Show resolved Hide resolved
#. Drastically cheaper and more efficient to execute. FlytePropeller is written in Golang and uses an event loop model. Each process of FlytePropeller can execute 1000's of tasks concurrently.
#. Flyte will guarantee resource cleanup
#. Flyteconsole plugins (capability coming soon) can be added to customize visualization and progress tracking of the execution
#. Resource controls and backpressure management is available
#. Implement once, use in any SDK or language

**cons**
kumare3 marked this conversation as resolved.
Show resolved Hide resolved

#. Need to be implemented in golang
#. Needs a FlytePropeller build - *currently*
#. Need to implement contract in some spec language like protobf, openAPI etc
#. Development cycle can be much slower than flytekit only plugins


===============================================
How do I decide which path to take?
===============================================

.. image:: https://raw.githubusercontent.com/flyteorg/flyte/static-resources/img/core/extend_flyte_flowchart.png
:alt: Ok you want to add a plugin, but which type? Follow the flowchart and then select the right next steps.


Use the conclusion of the flow-chart to refer to the right doc
================================================================

- :ref:`extend-plugin-flytekit-python`
- :ref:`extend-plugin-flyte-backend`
26 changes: 23 additions & 3 deletions rsts/plugins/index.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,29 @@
.. _plugins:

####################
Available Plugins
####################
########################################
Extend Flyte and Available Extenstions
########################################

.. _plugins_howto:

====================
How to extend Flyte
====================
Flyte as platform was designed with extensibility as a core primitive. Flyte is essentially an integration framework and hence extensibility is possible through-out the system.
The following sections will guide you through writing your own extensions - either private or public (contribute back to the community).

.. toctree::
:maxdepth: 1
:name: howtoextendtoc

extend/intro
extend/flytekit_python
extend/flyte_backend


====================
Available Extensions
====================
Following is a list of maintained plugins for Flyte and guides on how to install / use them.

.. toctree::
Expand Down