Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal]: Flyte System Tags and metadata #3320

Merged
merged 11 commits into from
Jul 20, 2023
124 changes: 124 additions & 0 deletions rfc/system/0001-flyte-execution-tags.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Flyte Execution Tags and Metadata

**Authors:**

- @kumare3

## 1 Executive Summary

Flyte currently provides no visual ways of grouping executions and other
entities with a “project” and “domain”. Usually a project is used by a team of
pingsutw marked this conversation as resolved.
Show resolved Hide resolved
engineers and/or researchers and sometimes they may be experimenting in
isolation or running multiple co-horts of experiments. Also there are cases in
which grouping executions by a group might make them easier to find and
associate them better. This document provides a solution of how we could
improve the experience of discovering executions within Flyte. It also provides
motivation of how this feature could further improve other ecosystem projects.

## 2 Motivation

As a User I want to
- Group a certain number of executions into an experiment group - for ease of debugging and discovery.
fg91 marked this conversation as resolved.
Show resolved Hide resolved
- I want to mark certain executions as “blessed” or “released”. This could be done through providing a semantic version after the execution is successful
- I want to group all “sub launchplan” executions with the parent execution.
- External systems could group executions based on some identifiers.
- Users could name their executions without having to worry about the character limits, uniqueness constraints and limited characterset.
- Simplify filtering of certain executions

## 3 Proposed Implementation

### Support for tags

We propose to solve the problem of discovery by supporting arbitrary metadata association with an entity. This is similar to conceppt of “tags” as in AWS. The tags are represented as `“key”: “value”` pairs. In kuberenetes this can be represented using “labels” and “annotations”. Labels and annotations are already supported per execution as documented in - [ExecutionCreateRequest](https://docs.flyte.org/projects/flyteidl/en/latest/protos/docs/admin/admin.html#executioncreaterequest) -> [ExecutionSpec](https://docs.flyte.org/projects/flyteidl/en/latest/protos/docs/admin/admin.html#executionspec). Moreover, every project supports default labels [Project](https://docs.flyte.org/projects/flyteidl/en/latest/protos/docs/admin/admin.html#project). Thus the final execution will have a the union of the project default labels + the user specified labels as it exists today.

Currently the resultant labels are not persisted and are only applied to the
pingsutw marked this conversation as resolved.
Show resolved Hide resolved
execution in Kubernetes. As a first step, we recommend that these labels are
persisted associated with an execution and then ListExecutions API is updated
to return all
- filtered executions by labels with supported `and` `or` queries
pingsutw marked this conversation as resolved.
Show resolved Hide resolved
- all associated executions with every execution
- limit total number of labels per exection to 10-15

Once this is implemented the UI and CLI can be updated to support these
queries.

### CLI Interface

A workflow or task can be executed using

```bash
pyflyte run --remote --labels k:v --labels k1:v1 test.py wf --input1=10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably call the arg here --tag to not confuse this with kubernetes labels.
And then probably assign the tags to k8s annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great point, but the rpc field is sadly already called label. and these will become k8s labels

```
(or equivalently in flytectl)

flytectl and flyteremote can support filtering of executions by labels. Example
in flytectl,
```bash
flytectl get execution -p flytesnacks -d development --filter.labels="k:v"
```

### UI Interface

Two approaches

#### Approach 1: Treat all labels the same way and allow search/filter and click based grouping
In this approach users will get the regular executions view with all the labels
available on each execution. The users are allowed to filter an exection simply
by clicking on a label and then all executions are filtered by that label.

#### Approach 2: Certain label keys are treated special
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this approach is chosen, I wonder whether it would be nicer for the user to do

pyflyte run --remote --group foo --experiment bar ...

instead of

pyflyte run --remote --labels group:foo --labels experiment:bar ...

This doesn't mean that under the hood the labels mechanism couldn't be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat opposed to this as it could be confusing to users as to what is a label vs what is a keyword cli argument 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do we wanna create group and experiment as CLI arguments and introduce them as a concept? 🤔

Copy link
Member

@fg91 fg91 May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer option 1: treat all labels the same way. Users might not want to follow the categories we deem sensible. Experiment tracking servers like Mlflow or Wandb, which also have such a tagging mechanism, simply allow users to assign arbitrary tags. I would argue that ML engineers are used to this and we should provide the same UX without imposing special naming conventions.

Only exception: execution name
I find it really helpful to have the pod names include customizable identifiers.
We have a registration script, similar to pyflyte run with has an --execution_name arg. The user provided value is appended with a random uuid, as is currently already chosen for the execution ids, and the result is checked against the execution name regex again and then passed to FlyteRemote.execute(execution_name=...) (already supported, see here). So I wouldn't treat execution name with a pod label but the pods metadata.name.

This comment is another argument for not treating execution names with labels but instead metadata.name since I agree that tags need to be mutable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bstadlbauer @fg91 @elibixby @flixr @goyalankit Some questions for you

  1. Do you prefer key-value pair tags or tags that only have key?
  2. Should we add tags to Kubernetes label?

Currently, execution spec (with labels) is serialized to byte and is stored in the execution table. it's impossible to add / delete / update tags. if we use k8s client to filtered flyteworkflow (CRD) by labels. we cannot search a execution after CR is deleted.

I have a PR that adds tags table. it allows us easily add / update / delete tags, and even attach tags to task / workflow / project. however, it's not key-value pair tags for now. If we decide to use key-value pair tags, I just need to add a new column to the tags table and update the query. I'd like to know your thought first.

btw, the current implementation works with both Mysql and Postgres.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My $0.02 is let's keep it simple and support what you call key-only tags.

A person can 'hack' this to resemble key-value if needed (ie 'costcenter-12'), but we don't need to manage that complexity on the back end or in the UI when we get to figuring out how to let folks use tags to sort/group things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • In my opinion key-only tags are perfectly fine and what ML engineers are used to from experiment tracking servers

  • Should we add tags to Kubernetes label

    I think being able to add/delete/update tags after the execution has already started or ended is an important feature. User story: an experiment is training/trained really well and I want to mark it for later. This is something that is not known when starting the execution. But updating/deleting/adding tags when the execution is already running would mean that the k8s labels are not in sync with what is stored in the tags table. I'd therefore say that I wouldn't apply the tags as labels to k8s.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response here but agreed with what's been said above. Key only tags would also solve all our usecases 👍

I think being able to add/delete/update tags after the execution has already started or ended is an important feature
+1 to this and the reasoning of not applying those to k8s

- “group” will group everything
- “experiment” will also group everything with higher priority.
- “name” will override the execution id with the name?
Copy link
Contributor

@goyalankit goyalankit Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are users allowed to change the labels? If they are then overriding might be an issue since you might have already fired async events to external systems. So I think it might be useful to maintain executionID as an identifier that can't be modified once execution has been created.

Alternatively, this could be an alias to the execution ID rather than overriding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executionID cannot be changed - it is immutable and unique per project/domain.
name is just an alias. I will update the doc to reflect this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but i do like the idea of immutable labels as well. once added you cannot change them

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we'll support both mutable and immutable labels?


![Grouping / Filtering UX](https://raw.githubusercontent.com/flyteorg/static-resources/main/flyte/rfc/tags/labels-filter.png)


### Support for descriptions
pingsutw marked this conversation as resolved.
Show resolved Hide resolved

Users want to describe their executions and especially once an experiment
succeeds they may want to add a lot more description and data about the
experiment. Thus, we propose to add descriptions to the execution as well.

Allow add description when you start an execution
```bash
pyflyte run --remote --labels k:v --labels k1:v1 --description "........" test.py wf --input1=10
```

It should be possible to add the description as a Markdown
```bash
pyflyte run --remote --labels k:v --labels k1:v1 --description README.md test.py wf --input1=10
```

It should be possible to add a description for an execution in the UI, after
the execution has been created.
![UI Descriptions](https://raw.githubusercontent.com/flyteorg/static-resources/main/flyte/rfc/tags/description-edit.png)

## 4 Metrics & Dashboards
NA

## 5 Drawbacks
It is important to understand that this may add a little more stress to the
metadata database. Back of the envelope calculation

-> 1 million executions * 20 labels each.
-> each label has "key" and "value". Key is 10 characters, value is 64
characters
-> 1.5 * 10^9 bytes (assuming one byte per character" -> 1.5GB

This is not significant and will increase as executions increase
pingsutw marked this conversation as resolved.
Show resolved Hide resolved

## 6 Alternatives
NA


## 7 Potential Impact and Dependencies
We this this is one of the most requested features in Flyte and will solve
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We this this is one of the most requested features in Flyte and will solve
This is one of the most requested features in Flyte and will solve

a lot of problems.


## 8 Unresolved questions
NA

## 9 Conclusion
WIP