Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Support for Native JSON types in flyte #5318

Closed
2 tasks done
kumare3 opened this issue May 3, 2024 · 13 comments
Closed
2 tasks done

[Core feature] Support for Native JSON types in flyte #5318

kumare3 opened this issue May 3, 2024 · 13 comments
Assignees
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request Epic: dataclass size:XXL This PR changes 1000+ lines, ignoring generated files.

Comments

@kumare3
Copy link
Contributor

kumare3 commented May 3, 2024

Motivation: Why do you think this is important?

Currently flyte supports JSON through protobuf Struct. This causes a lot of problems for the users, especially when using ints as Structs only store numbers - which are double values.

Goal: What should the final outcome look like, ideally?

Support a new FlyteIDL.LiteralType called LiteralType.JSON. So users can simply use any object that dumps to a JSON using binary / bytearray as the transport layer.

It should be possible to also have customer json serializers and deserializers for the data like JsonPickle etc

Describe alternatives you've considered

New literalType called LiteralType.JSON
This maps to a Literal of type binary (wire format)
Optionally the metadata / schema can be extracted and stored

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@kumare3 kumare3 added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels May 3, 2024
Copy link

runllm bot commented May 3, 2024

To support native JSON types in Flyte, you propose introducing a new LiteralType.JSON in FlyteIDL, using binary/bytearray for transport, and allowing custom JSON serializers/deserializers. This approach aims to address issues with current JSON handling via protobuf Struct, particularly with integer representation.

Would you like more detail on how to implement this feature or any potential challenges?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @RunLLM to ask follow-up questions. See runllm.com for more!

@kumare3
Copy link
Contributor Author

kumare3 commented May 3, 2024

Related to #4505

@MortalHappiness
Copy link
Member

#take

@MortalHappiness MortalHappiness removed their assignment May 3, 2024
@thomasjpfan
Copy link
Member

From #4505 (comment), I see the follow Python types that can use Literal.JSON:

  1. dicts -> JsonPickle
  2. dataclasses -> Use Mashumuro's encoders/decorders or the object's to_json + from_json (These methods are defined with DataClassJSONMixin)

@kumare3 Are you considering other Python types that would use Literal.JSON?

@kumare3
Copy link
Contributor Author

kumare3 commented May 3, 2024

I am saying wholesale migration to json for these unsupported types from struct

@pingsutw pingsutw self-assigned this May 6, 2024
@wild-endeavor
Copy link
Contributor

#5337

@Future-Outlier
Copy link
Member

Future-Outlier commented Sep 18, 2024

The PRs will be created as follows:

  • Flyte
    [Flyte][1][IDL] Binary IDL With MessagePack
    [Flyte][2][Literal Type For Scalar] Binary IDL With MessagePack
    [Flyte][3][Attribute Access] Binary IDL With MessagePack
    [Flyte][4][FlyteCTL] Binary IDL With MessagePack
    [Flyte][5][Compiler][Struct Literal Type using JSON SCHEMA] Binary IDL With MessagePack

  • flytekit
    [flytekit][1][SimpleTransformer] Binary IDL With MessagePack
    [flytekit][2][untyped dict] Binary IDL With MessagePack
    [flytekit][3] [list, dict and nested cases] Binary IDL With MessagePack
    [flytekit][4][pure dataclass and nested dataclass] Binary IDL With MessagePack
    [flytekit][5][Attribute Access] Binary IDL With MessagePack
    [flytekit][6][Flyte Types] Binary IDL With MessagePack

@Future-Outlier
Copy link
Member

follow up:
for flytekit, we can suport Dict[int, dataclass] as input, but this need to also change click_types.py, which will be big enough to open a new PR.

@Future-Outlier
Copy link
Member

follow up 2:
we should consider rewrite get_literal_type for Dict[int, dataclasss] attribute access.

@wild-endeavor
Copy link
Contributor

wild-endeavor commented Sep 27, 2024

we should consider rewrite get_literal_type for Dict[int, dataclasss] attribute access

what is this now and what should it be?

@Future-Outlier
Copy link
Member

we should consider rewrite get_literal_type for Dict[int, dataclasss] attribute access

what is this now and what should it be?

now: Dict[int, dataclasss] not supported, and Dict[int, dataclasss] not supported with attribute access
after: Dict[int, dataclasss] supported, and Dict[int, dataclasss] supported with attribute access
get_literal_type is the function to provide each field in the Dict for the propeller to access the attribute's type.

@Future-Outlier
Copy link
Member

Future-Outlier commented Oct 3, 2024

This case can be supported, but we don't have backward compatible issue now, since no users report about this before.

# Status is an Enum class
@dataclass
class DC:
    grid: Dict[str, List[Optional[Union[int, str, float, bool, Status, InnerDC]]]] = field(default_factory=lambda: {
        'all_types': [InnerDC()],
    })

Mashumaro Issue: Fatal1ty/mashumaro#252

@eapolinario
Copy link
Contributor

As you can witness from all the PRs in this issue, there was a lot of movement in this area, and we can finally close this issue as we officially released 1.14.0 last week, which adds an improved support for dataclasses, untyped dictionaries, and pydantic models.

Check the full release log in https://github.com/flyteorg/flytekit/releases/tag/v1.14.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request Epic: dataclass size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

No branches or pull requests

7 participants