Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] MessagePack IDL, Pydantic Support, and Attribute Access #6022

Merged
merged 20 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions docs/user_guide/data_types_and_io/accessing_attributes.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ Note that while this functionality may appear to be the normal behavior of Pytho
Consequently, accessing attributes in this manner is, in fact, a specially implemented feature.
This functionality facilitates the direct passing of output attributes within workflows, enhancing the convenience of working with complex data structures.

```{important}
Flytekit version >= v1.14.0 supports Pydantic BaseModel V2, you can do attribute access on Pydantic BaseModel V2 as well.
```

```{note}
To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks].
```
Expand All @@ -19,7 +23,7 @@ To begin, import the required dependencies and define a common task for subseque

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 1-10
:lines: 1-9
```

## List
Expand All @@ -31,38 +35,38 @@ Flyte currently does not support output promise access through list slicing.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 14-23
:lines: 13-22
```

## Dictionary
Access the output dictionary by specifying the key.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 27-35
:lines: 26-34
```

## Data class
Directly access an attribute of a dataclass.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 39-53
:lines: 38-51
```

## Complex type
Combinations of list, dict and dataclass also work effectively.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 57-80
:lines: 55-78
```

You can run all the workflows locally as follows:

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 84-88
:lines: 82-86
```

## Failure scenario
Expand Down
25 changes: 24 additions & 1 deletion docs/user_guide/data_types_and_io/dataclass.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,31 @@ When you've multiple values that you want to send across Flyte entities, you can
Flytekit uses the [Mashumaro library](https://github.com/Fatal1ty/mashumaro)
to serialize and deserialize dataclasses.
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved

With the 1.14 release, `flytekit` adopted `MessagePack` as the
serialization format for dataclasses, overcoming a major limitation of serialization into a JSON string within a Protobuf `struct` datatype, like the previous versions do:

davidmirror-ops marked this conversation as resolved.
Show resolved Hide resolved
to store `int` types, Protobuf's `struct` converts them to `float`, forcing users to write boilerplate code to work around this issue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to insert a new line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think insert a new line is more readable


By default, `flytekit >= 1.14` will produce `msgpack` bytes literals when serializing dataclasses.
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved

:::{important}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, `flytekit >= 1.14` will produce `msgpack` bytes literals when serializing dataclasses.

If you're serializing dataclasses using `flytekit` version >= v1.14.0, and you want to produce Protobuf `struct
literal` instead, you can set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to insert a new line, I think it breaks the code format


:::{important}
If you're using Flytekit version below v1.11.1, you will need to add `from dataclasses_json import dataclass_json` to your imports and decorate your dataclass with `@dataclass_json`.
If you're using Flytekit version < v1.11.1, you will need to add `from dataclasses_json import dataclass_json` to your imports and decorate your dataclass with `@dataclass_json`.
:::

:::{important}
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved
Flytekit version < v1.14.0 will produce protobuf struct literal for dataclasses.

Flytekit version >= v1.14.0 will produce msgpack bytes literal for dataclasses.

If you're using Flytekit version >= v1.14.0 and you want to produce protobuf struct literal for dataclasses, you can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to mention why would a user want to produce protobuf struct literal instead of msgpack bytes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Flytekit version < v1.14.0 will produce protobuf struct literal for dataclasses.
Flytekit version >= v1.14.0 will produce msgpack bytes literal for dataclasses.
If you're using Flytekit version >= v1.14.0 and you want to produce protobuf struct literal for dataclasses, you can

set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`.

This was already mentioned above

Also in the readthedocs build, you can see there are two important blocks nested


For more details, you can refer the MSGPACK IDL RFC: https://github.com/flyteorg/flyte/blob/master/rfc/system/5741-binary-idl-with-message-pack.md
:::

```{note}
Expand Down
3 changes: 2 additions & 1 deletion docs/user_guide/data_types_and_io/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ Here's a breakdown of these mappings:
- Use ``pyspark.DataFrame`` as a type hint.
* - ``pydantic.BaseModel``
- ``Map``
- To utilize the type, install the ``flytekitplugins-pydantic`` plugin.
- To utilize the type, install the ``pydantic>2`` module.
- Use ``pydantic.BaseModel`` as a type hint.
* - ``torch.Tensor`` / ``torch.nn.Module``
- File
Expand Down Expand Up @@ -144,6 +144,7 @@ flytefile
flytedirectory
structureddataset
dataclass
pydantic_basemodel
accessing_attributes
pytorch_type
enum_type
Expand Down
104 changes: 104 additions & 0 deletions docs/user_guide/data_types_and_io/pydantic_basemodel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
(pydantic_basemodel)=

# Pydantic BaseModel

```{eval-rst}
.. tags:: Basic
```

`flytekit` version >=1.14 supports natively the `JSON` format that Pydantic `BaseModel` produces, enhancing the
interoperability of Pydantic BaseModels with the Flyte type system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to insert new line


:::{important}
Pydantic BaseModel V2 only works when you are using flytekit version >= v1.14.0.
:::

With the 1.14 release, `flytekit` adopted `MessagePack` as the serialization format for Pydantic `BaseModel`,
overcoming a major limitation of serialization into a JSON string within a Protobuf `struct` datatype like the previous versions do:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

to store `int` types, Protobuf's `struct` converts them to `float`, forcing users to write boilerplate code to work around this issue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think new line after a colon (unless a list) looks strange


By default, `flytekit >= 1.14` will produce `msgpack` bytes literals when serializing dataclasses, preserving the types defined in your `BaseModel` class.
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved

:::{important}
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved
If you're serializing dataclasses using `flytekit` version >= v1.14.0 and you want to produce Protobuf `struct literal` instead, you can set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`.
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved

For more details, you can refer the MESSAGEPACK IDL RFC: https://github.com/flyteorg/flyte/blob/master/rfc/system/5741-binary-idl-with-message-pack.md
:::

```{note}
You can put Dataclass and FlyteTypes (FlyteFile, FlyteDirectory, FlyteSchema, and StructuredDataset) in a pydantic BaseModel.
```

```{note}
To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks].
```

To begin, import the necessary dependencies:

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 1-9
```

Build your custom image with ImageSpec:
```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 11-14
```

## Python types
We define a `pydantic basemodel` with `int`, `str` and `dict` as the data types.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:pyobject: Datum
```

You can send a `pydantic basemodel` between different tasks written in various languages, and input it through the Flyte console as raw JSON.

:::{note}
All variables in a data class should be **annotated with their type**. Failure to do should will result in an error.
:::

Once declared, a dataclass can be returned as an output or accepted as an input.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 26-41
```

## Flyte types
We also define a data class that accepts {std:ref}`StructuredDataset <structured_dataset>`,
{std:ref}`FlyteFile <files>` and {std:ref}`FlyteDirectory <folder>`.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 45-86
```

A data class supports the usage of data associated with Python types, data classes,
flyte file, flyte directory and structured dataset.

We define a workflow that calls the tasks created above.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:pyobject: basemodel_wf
```

You can run the workflow locally as follows:

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 99-100
```

To trigger a task that accepts a dataclass as an input with `pyflyte run`, you can provide a JSON file as an input:
```
pyflyte run \
https://raw.githubusercontent.com/flyteorg/flytesnacks/b71e01d45037cea883883f33d8d93f258b9a5023/examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py \
basemodel_wf --x 1 --y 2
```

[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/
Loading