All the Flytekit plugins maintained by the core team are added here. It is not necessary to add plugins here, but this is a good starting place.
Please file an issue.
Flytekit plugins are structured as micro-libs and can be authored in an independent repository.
Refer to the Python microlibs blog to understand the idea of microlibs.
The plugins maintained by the core team can be found in this repository and provide a simple way of discovery.
Plugins should have their own unit tests.
Some guidelines to help you write the Flytekit plugins better.
-
The folder name has to be
flytekit-*
, e.g.,flytekit-hive
. In case you want to group for a specific service, then useflytekit-aws-athena
. -
Flytekit plugins use a concept called Namespace packages, and thus, the package structure is essential.
Please use the following Python package structure:
flytekit-myplugin/ - README.md - setup.py - flytekitplugins/ - myplugin/ - __init__.py - tests - __init__.py
NOTE: the inner package
flytekitplugins
DOES NOT have an__init__.py
file. -
The published packages have to be named
flytekitplugins-{package-name}
, where{package-name}
is a unique identifier for the plugin. -
The setup.py file has to have the following template. You can use it as is by editing the TODO sections.
from setuptools import setup
# TODO put the plugin name here
PLUGIN_NAME = "<plugin-name e.g. pandera>"
# TODO decide if the plugin is regular or `data`
# for regular plugins
microlib_name = f"flytekitplugins-{PLUGIN_NAME}"
# For data/persistence plugins
# microlib_name = f"flytekitplugins-data-{PLUGIN_NAME}"
# TODO add additional requirements
plugin_requires = ["flytekit>=1.1.0b0,<2.0.0, "<other requirements>"]
__version__ = "0.0.0+develop"
setup(
name=microlib_name,
version=__version__,
author="flyteorg",
author_email="[email protected]",
# TODO Edit the description
description="My awesome plugin.....",
# TODO alter the last part of the following URL
url="https://github.com/flyteorg/flytekit/tree/master/plugins/flytekit-...",
long_description=open("README.md").read(),
long_description_content_type="text/markdown",
namespace_packages=["flytekitplugins"],
packages=[f"flytekitplugins.{PLUGIN_NAME}"],
install_requires=plugin_requires,
license="apache2",
python_requires=">=3.8",
classifiers=[
"Intended Audience :: Science/Research",
"Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Topic :: Scientific/Engineering",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
"Topic :: Software Development",
"Topic :: Software Development :: Libraries",
"Topic :: Software Development :: Libraries :: Python Modules",
],
# TODO OPTIONAL
# FOR Plugins where auto-loading on installation is desirable, please uncomment this line and ensure that the
# __init__.py has the right modules available to be loaded, or point to the right module
# entry_points={"flytekit.plugins": [f"{PLUGIN_NAME}=flytekitplugins.{PLUGIN_NAME}"]},
)
-
Each plugin should have a README.md, which describes how to install it with a simple example. For example, refer to flytekit-greatexpectations' README.
-
Each plugin should have its own tests' package. NOTE:
tests
folder should have an__init__.py
file. -
There may be some cases where you might want to auto-load some of your modules when the plugin is installed. This is especially true for
data-plugins
andtype-plugins
. In such a case, you can add a special directive in thesetup.py
which will instruct Flytekit to automatically load the prescribed modules.Following shows an excerpt from the
flytekit-data-fsspec
plugin's setup.py file.setup( entry_points={"flytekit.plugins": [f"{PLUGIN_NAME}=flytekitplugins.{PLUGIN_NAME}"]}, )
Currently we advocate pinning to minor releases of flytekit. To bump the pins across the board, cd plugins/
and then
update the command below with the appropriate range and run
for f in $(ls **/setup.py); do sed -i "s/flytekit>.*,<1.1/flytekit>=1.1.0b0,<1.2/" $f; done
Try using gsed
instead of sed
if you are on a Mac. Also this only works of course for setup files that start with the version in your sed command. There may be plugins that have different pins to start out with.
- Example of a simple Python task that allows adding only Python side functionality: flytekit-greatexpectations
- Example of a TypeTransformer or a Type Plugin: flytekit-pandera. These plugins add new types to Flyte and tell Flyte how to transform them and add additional features through types. Flyte is a multi-lang system, and type transformers allow marshaling between Flytekit and backend and other languages.
- Example of TaskTemplate plugin which also allows plugin writers to supply a prebuilt container for runtime: flytekit-sqlalchemy
- Example of a SQL backend plugin where the actual query invocation is done by a backend plugin: flytekit-snowflake
- Example of a Meta plugin that can wrap other tasks: flytekit-papermill
- Example of a plugin that modifies the execution command: flytekit-spark OR flytekit-aws-sagemaker
- Example that allows executing the user container with some other context modifications: flytekit-kf-tensorflow
- Example of a Persistence Plugin that allows data to be stored to different persistence layers: flytekit-data-fsspec