Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Add type support for pydantic BaseModels #1660
Feat: Add type support for pydantic BaseModels #1660
Changes from 48 commits
fe9434f
b4e6f80
ff73464
e9a714b
17f3441
594026a
1cf2556
82d0773
c6d14a4
1433cd2
640d17a
8325a62
f2140b6
2ea5b44
ccbb70b
b671d3d
240aa30
34b1576
bd686ad
67da286
de92c75
24a4152
1880913
595183d
75ea89b
39acb76
ab448d7
249f43f
f9ac9e9
da81828
95edbab
b42bb4d
7ccaa95
515a748
24a4d0e
68a2844
1466ff8
4c022de
96ef7e4
a477d90
056b069
aa33e31
deb42bb
d55b251
0c4ce8c
bc27331
2026698
0e9f215
4c463de
cf75326
eec501d
0c0a483
01c065e
68ddfa5
11235fb
1e946bb
f58a55b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is my only worry. I don't think I've seen this pattern before (cc @EngHabu @hamersaw @eapolinario) and I'm not sure if there's code that will break in the rest of flyte/flytekit. The literal type returned by this transformer is a
simple
but the value given will be aLiteral
containing aLiteralMap
. Is there any reason why this won't work? I feel like it should be fine, it's still just one binding, but wanted to check. Is this a dumb question? does this pattern already exist elsewhere that I'm forgetting?The reason it's a literalmap is because flyte types are serialized to a separate dict, and a pointer (effectively think of it as a memory pointer) is placed instead - should we investigate adding a metadata field to the Literal message @EngHabu?
@ArthurBook/ @fg91 i guess my main question is why the need for the separate store? is there a binary/byte type in pydantic that can be somehow flagged that we can just serialize the pb message into?
[side note: the broader pattern here is offloading and returning an artifact, which we can of course make json serializable, and then just serialize the Artifact. this complicates the network call of course and this concept should be held off until we turn flytekit's type engine async.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have (naively) used
types.LiteralType(simple=types.SimpleType.STRUCT)
before in an internal plugin and never saw any suspicious behavior. Doesn't mean there isn't anything I have missed though.The pydantic base model is serialized into a json string by pydantic. The offloaded objects are stored in a
dict[str, Literal]
by the logic of this type transformer.Parsing the json string into a protobuf struct (see here) and then saving both the "json struct" and the "store" into a
LiteralMap
was my first idea to put these two into a single literal but I'm open to any other way as well!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change request (blocking): Being fully aware that
type_engine.TypeEngine.get_available_transformers()
was my suggestion, I don't think it gives us the desired result:I would have expected supported types like pandas dataframe or torch tensor/module to be contained in this list. (Also ints, floats, ... should be excluded as well. What about BaseModels themselves?)
We might have to go with an explicit list after all.
@pingsutw do you know any other mechanism to discover all type transformers, including ones coming from (potentially private) plugins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, think we can figure out something better here.
I'll leave this as an open for now waiting for a reply from @pingsutw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We lazily loaded some transformers in the new flytekit, so you didn't see pandas, and tensor, etc. we can add lazy_load_transformer to get_available_transformers. therefore, we'll be able to see the pandas in the _REGISTRY.keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pingsutw is out for a bit. could you help me understand the reasoning for being aware of the transformers registered? I don't think knowing a priori all the types is necessarily the right approach. Would it be possible to make this registerable? Like, stock it comes with a certain set of types. If for some reason during the operation of a task, a user wants to make Pydantic aware of a certain type, they can add a flytekit TypeEngine transformer for that type, and then register that type with this pydantic engine. that's kinda how the structured dataset transformer works. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't really understand these validators. is this really required? is this a pydantic thing? i feel like we should let the base flytekit type engine handle the transformations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC one could consider validators as the
to_python_value
function of pydantic when deserializing.