Flyte

Flyte is a workflow automation platform for complex, mission-critical data, and ML processes at scale

Home Page · Quick Start · Documentation · Features · Community & Resources · Changelogs · Components

💥 Introduction

Flyte is a structured programming and distributed processing platform that enables highly concurrent, scalable, and maintainable workflows for Machine Learning and Data Processing. It is a fabric that connects disparate computation backends using a type-safe data dependency graph. It records all changes to a pipeline, making it possible to rewind time. It also stores a history of all executions and provides an intuitive UI, CLI, and REST/gRPC API to interact with the computation.

Flyte is more than a workflow engine -- it uses workflow as a core concept, and task (a single unit of execution) as a top-level concept. Multiple tasks arranged in a data producer-consumer order creates a workflow.

Workflows and Tasks can be written in any language, with out-of-the-box support for Python, Java and Scala.

⏳ Five Reasons to Use Flyte

Kubernetes-Native Workflow Automation Platform
Ergonomic SDK's in Python, Java & Scala
Versioned & Auditable
Reproducible Pipelines
Strong Data Typing

🚀 Quick Start

With Docker installed and Flytectl installed, run the following command:

  flytectl sandbox start

This creates a local Flyte sandbox. Once the sandbox is ready, you should see the following message: Flyte is ready! Flyte UI is available at http://localhost:30081/console.

Visit http://localhost:30081/console to view the Flyte dashboard.

Here's a quick visual tour of the console:

To dig deeper into Flyte, refer to the Documentation.

⭐️ Current Deployments & Contributors

🔥 Features

Used at Scale in production by 500+ users at Lyft, with more than 1 million executions, and 40+ million container executions per month
A data-aware platform
Enables collaboration across your organization by:
- Executing distributed data pipelines/workflows
- Reusing tasks across projects, users, and workflows
- Making it easy to stitch together workflows from different teams and domain experts
- Backtracing to a specified workflow
- Comparing results of training workflows over time and across pipelines
- Sharing workflows and tasks across your teams
- Simplifying the complexity of multi-step, multi-owner workflows
Quick registration -- start locally and scale to the cloud instantly
Centralized Inventory constituting Tasks, Workflows, and Executions
gRPC / REST interface to define and execute tasks and workflows
Type safe construction of pipelines -- each task has an interface characterized by its input and output, so illegal construction of pipelines fails during declaration, rather than at runtime
Supports multiple data types for machine learning and data processing pipelines, such as Blobs (images, arbitrary files), Directories, Schema (columnar structured data), collections, maps, etc.
Memoization and Lineage tracking
Provides logging and observability
Workflow features:
- Start with one task, convert to a pipeline, attach multiple schedules, trigger using a programmatic API, or on-demand
- Parallel step execution
- Extensible backend to add customized plugin experience (with simplified user experience)
- Branching
Inline subworkflows (a workflow can be embedded within one node of the top-level workflow)
Distributed remote child workflows (a remote workflow can be triggered and statically verified at compile time)
Array Tasks (map a function over a large dataset -- ensures controlled execution of thousands of containers)
Dynamic workflow creation and execution with runtime type safety
Container side plugins with first-class support in Python
PreAlpha: Arbitrary flytekit-less containers supported (RawContainer)
Guaranteed reproducibility of pipelines via:
- Versioned data, code, and models
- Automatically tracked executions
- Declarative pipelines
Multi-cloud support (AWS, GCP, and others)
Extensible core, modularized, and deep observability
No single point of failure, and is resilient by design
Automated notifications to Slack, Email, and Pagerduty
Multi K8s cluster support
Out of the box support to run Spark jobs on K8s, Hive queries, etc.
Snappy Console
Python CLI and Golang CLI (flytectl)
Written in Golang and optimized for large running jobs' performance
Grafana templates (user/system observability)

In Progress

Demos; Distributed Pytorch, feature engineering, etc.
Integrations; Great Expectations, Feast
Least-privilege Minimal Helm Chart
Relaunch execution in recover mode
Documentation as code

🔌 Available Plugins

Containers
K8s Pods
AWS Batch Arrays
K8s Pod Arrays
K8s Spark (native Pyspark and Java/Scala)
AWS Athena
Qubole Hive
Presto Queries
Distributed Pytorch (K8s Native) -- Pytorch Operator
Sagemaker (builtin algorithms & custom models)
Distributed Tensorflow (K8s Native)
Papermill notebook execution (Python and Spark)
Type safe and data checking for Pandas dataframe using Pandera
Versioned datastores using DoltHub and Dolt
Use SQLAlchemy to query any relational database
Build your own plugins that use library containers

📦 Component Repos

Repo	Language	Purpose	Status
flyte	Kustomize,RST	deployment, documentation, issues	Production-grade
flyteidl	Protobuf	interface definitions	Production-grade
flytepropeller	Go	execution engine	Production-grade
flyteadmin	Go	control plane	Production-grade
flytekit	Python	python SDK and tools	Production-grade
flyteconsole	Typescript	admin console	Production-grade
datacatalog	Go	manage input & output artifacts	Production-grade
flyteplugins	Go	flyte plugins	Production-grade
flytestdlib	Go	standard library	Production-grade
flytesnacks	Python	examples, tips, and tricks	Incubating
flytekit-java	Java/Scala	Java & scala SDK for authoring Flyte workflows	Incubating
flytectl	Go	A standalone Flyte CLI	Incomplete

🔩 Production K8s Operators

Repo	Language	Purpose
Spark	Go	Apache Spark batch
Flink	Go	Apache Flink streaming

Functional Tests Matrix

We run a suite of tests (defined in https://github.com/flyteorg/flytesnacks/blob/master/cookbook/flyte_tests_manifest.json) to ensure that basic functionality and a subset of the integrations work across a variety of release versions. Those tests are run in a cluster where specific versions of the flyte components, such as console, flyteadmin, datacatalog, and flytepropeller, are installed. The table below has different release versions as the columns and the result of each test suite as rows.

workflow group	nightly
core
integrations-hive
integrations-k8s-spark
integrations-kfpytorch
integrations-pod
integrations-pandera_examples
integrations-papermilltasks
integrations-greatexpectations
integrations-sagemaker-pytorch
integrations-sagemaker-training

🤝 Community & Resources

Here are some resources to help you learn more about Flyte.

Communication Channels

Biweekly Community Sync

📣 Flyte OSS Community Sync Every other Tuesday, 9am-10am PDT. Check out the calendar, and register to stay up-to-date with our meeting times. Or join us on Zoom.
Upcoming meeting agenda, previous meeting notes, and a backlog of topics are captured in this document.
If you'd like to revisit any previous community sync meetings, you can access the video recordings on Flyte's YouTube channel.

Blog Posts

Flyte blog site

Newsletter

Flyte Monthly

Conference Talks

Kubecon 2019 - Flyte: Cloud Native Machine Learning and Data Processing Platform video | deck
Kubecon 2019 - Running LargeScale Stateful workloads on Kubernetes at Lyft video
re:invent 2019 - Implementing ML workflows with Kubernetes and Amazon Sagemaker video
Cloud-native machine learning at Lyft with AWS Batch and Amazon EKS video
OSS + ELC NA 2020 splash
Datacouncil video | splash
FB AI@Scale Making MLOps & DataOps a reality
GAIC 2020
OSPOCon 2021:
- Building and Growing an Open Source Community for an Incubating Project video
- Enforcing Data Quality in Data Processing and ML Pipelines with Flyte and Pandera video
- Self-serve Feature Engineering Platform Using Flyte and Feast video
- Efficient Data Parallel Distributed Training with Flyte, Spark & Horovod video
KubeCon+CloudNativeCon North America 2021 - How Spotify Leverages Flyte To Coordinate Financial Analytics Company-Wide session
PyData Global 2021 - Robust, End-to-end Online Machine Learning Applications with Flytekit, Pandera and Streamlit session
ODSC West Reconnect - Deep Dive Into Flyte workshop

Podcasts

TWIML&AI - Scalable and Maintainable ML Workflows at Lyft - Flyte
Software Engineering Daily - Flyte: Lyft Data Processing Platform
MLOps Coffee session - Flyte: an open-source tool for scalable, extensible, and portable workflows
Open Data Science - West Warm Up session with Ketan Umare - Creator of Flyte

💖 All Contributors

A big thank you to the community for making Flyte possible!

Name		Name	Last commit message	Last commit date
Latest commit History 419 Commits
.github		.github
CHANGELOG		CHANGELOG
assets		assets
boilerplate		boilerplate
charts		charts
deployment		deployment
docker/sandbox		docker/sandbox
eks		eks
end2end		end2end
kustomize		kustomize
opta		opta
rfc		rfc
rsts		rsts
script		script
stats		stats
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
.readthedocs.yml		.readthedocs.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
RELEASE.md		RELEASE.md
doc-requirements.in		doc-requirements.in
doc-requirements.txt		doc-requirements.txt
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flyte

Home Page · Quick Start · Documentation · Features · Community & Resources · Changelogs · Components

💥 Introduction

⏳ Five Reasons to Use Flyte

🚀 Quick Start

⭐️ Current Deployments & Contributors

🔥 Features

In Progress

🔌 Available Plugins

📦 Component Repos

🔩 Production K8s Operators

Functional Tests Matrix

🤝 Community & Resources

Communication Channels

Biweekly Community Sync

Blog Posts

Newsletter

Conference Talks

Podcasts

💖 All Contributors

About

Releases

Packages

Languages

License

SmritiSatyanV/flyte

Folders and files

Latest commit

History

Repository files navigation

Flyte

Home Page · Quick Start · Documentation · Features · Community & Resources · Changelogs · Components

💥 Introduction

⏳ Five Reasons to Use Flyte

🚀 Quick Start

⭐️ Current Deployments & Contributors

🔥 Features

In Progress

🔌 Available Plugins

📦 Component Repos

🔩 Production K8s Operators

Functional Tests Matrix

🤝 Community & Resources

Communication Channels

Biweekly Community Sync

Blog Posts

Newsletter

Conference Talks

Podcasts

💖 All Contributors

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages