-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for python pickle type in flytekit/flyte #667
Conversation
Each task output should write to a unique file path. You do not need this. You are using the Flyte raw output path, so just simply use the output name like o0 |
Can we talk about this? I think I may be missing something. the original issue was to support arbitrary types, not a new type right? so that if a user does
it will still work. I haven't looked at the code change but this description seems to imply that the user would have to change it to
Am I understanding it correctly? we don't want to make the user do this, the code shouldn't have to change. |
Also let's leave the ast package out of it for now. I think we want the user to make things explicit. |
Regarding "Seems like it only happens in local execution, we will fail to register this kind of workflow"... if we're going to fail at registration time, we should fail earlier during local execution. |
@wild-endeavor we already talked about this, @pingsutw is making either a new PR or updating it |
We don't need to change anything now, and user-defined class will become pickle automatically
yeah, we've planned to remove it after meeting last week.
okay, I will make a new PR to address this issue. |
Codecov Report
@@ Coverage Diff @@
## master #667 +/- ##
==========================================
+ Coverage 85.80% 85.83% +0.02%
==========================================
Files 358 361 +3
Lines 29793 29947 +154
Branches 2428 2438 +10
==========================================
+ Hits 25564 25705 +141
- Misses 3590 3600 +10
- Partials 639 642 +3
Continue to review full report at Codecov.
|
minor nit, but this is great, +1 after you change the error message |
|
||
output = tf.to_python_value(ctx, lv, str) | ||
assert output == python_val | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add some more unit tests:
- a task that uses a list of Foo
- a task that uses a dict of str -> Foo
- a workflow that uses it, and testing that local workflow execution still works?
flytekit/core/promise.py
Outdated
@@ -78,6 +85,11 @@ def extract_value( | |||
literal_list = [extract_value(ctx, v, sub_type, flyte_literal_type.collection_type) for v in input_val] | |||
return _literal_models.Literal(collection=_literal_models.LiteralCollection(literals=literal_list)) | |||
elif isinstance(input_val, dict): | |||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these should have to be here. can you explain why? why doesn't the recursive extract_value
call work? and same for the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the input value is a list of pickles, we will serialize the whole list into one pickle file.
Otherwise, we will get tons of files if using a very long list.
Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kumare3 can you chime in on this? I don't think this is the behavior we want. I think we should get a very long list of files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wild-endeavor I actually like getting one file with the list in there, as it all pickled. But, not at the cost of some weird implementation. Performance wise and usage wise this is much better
32e5eed
to
c6d219b
Compare
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
flytekit/models/types.py
Outdated
""" | ||
return self._metadata | ||
|
||
@metadata.setter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting a unit test failure on trying to set metadata. The reason is because we had some help from OSS cleaning up the model file structure. See this PR.
Can you add the setters you need to the new location and then delete this file? It shouldn't be here anymore.
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Thanks all. |
* Init python pickle Signed-off-by: Kevin Su <[email protected]> * Refactor pickle support Signed-off-by: Kevin Su <[email protected]> * Fixed lint Signed-off-by: Kevin Su <[email protected]> * Fixed lint Signed-off-by: Kevin Su <[email protected]> * Fixed register error Signed-off-by: Kevin Su <[email protected]> * Added tests Signed-off-by: Kevin Su <[email protected]> * Fixed lint Signed-off-by: Kevin Su <[email protected]> * Handle list of pickle Signed-off-by: Kevin Su <[email protected]> * Fixed lint Signed-off-by: Kevin Su <[email protected]> * Added assert_type Signed-off-by: Kevin Su <[email protected]> * Updated comment Signed-off-by: Kevin Su <[email protected]> * Remove unnecessary cast Signed-off-by: Kevin Su <[email protected]> * Added more tests Signed-off-by: Kevin Su <[email protected]> * Fixed test Signed-off-by: Kevin Su <[email protected]> * Address comment Signed-off-by: Kevin Su <[email protected]> * Update list of pickle Signed-off-by: Kevin Su <[email protected]> * Update list of pickle Signed-off-by: Kevin Su <[email protected]> * Update list of pickle Signed-off-by: Kevin Su <[email protected]> * Fix tests Signed-off-by: Kevin Su <[email protected]> * Fix tests Signed-off-by: Kevin Su <[email protected]> * Fix tests Signed-off-by: Kevin Su <[email protected]> * Fix tests Signed-off-by: Kevin Su <[email protected]> * Fixed test Signed-off-by: Kevin Su <[email protected]> * Fixed test Signed-off-by: Kevin Su <[email protected]> * Fixed test Signed-off-by: Kevin Su <[email protected]> * add test Signed-off-by: Yee Hing Tong <[email protected]> * add test Signed-off-by: Yee Hing Tong <[email protected]> * Fixed lint Signed-off-by: Kevin Su <[email protected]> * Fixed import Signed-off-by: Kevin Su <[email protected]> * Handle nested list and dict Signed-off-by: Kevin Su <[email protected]> * Added tests Signed-off-by: Kevin Su <[email protected]> * Fixed lint Signed-off-by: Kevin Su <[email protected]> Co-authored-by: Yee Hing Tong <[email protected]> Signed-off-by: Robert Everson <[email protected]>
Signed-off-by: Kevin Su [email protected]
TL;DR
Add a new type
PythonPickle = FlyteFile[typing.TypeVar(PYTHON_PICKLE_FORMAT)]
.Any type that flyte can't recognize will become
PythonPickle
. e.g.typing.Any
,typing.List[int]
->PythonPickle
Function output (native python type) will save in PythonPickle
Let's assume that we have two tasks "A" and "B", We have several cases that need to handle.
Case 1: A.output (Pickle) -> B.input (Pickle)
Case 2: A.output (str) -> B.input (Pickle)
Seems like it only happens in local execution, we will fail to register this kind of workflow
We can't use a transformer to convert Literal to python value since we can't know literal's python type
Task A output type is str, but Task B expect input type is Any which will fall back to PythonPickle
In this scenario, We directly extract value from scalar, collection, or map.
Case 3: A.output (Pickle) -> B.input (str)
Seems like it only happens in local execution, we will fail to register this kind of workflow
Case 4: Should not write the data to the same Pickle file
Each task output should write to a unique file path.
To solve this problem, the file name will be combined with the module name, function name, and output index.
e.g.
__main__.add_question.o0
Case 5: Flytekit without type annotations
We can use the
ast
package to find whether a task or workflow has a return value.if task output doesn't have type annotations and has a return value, then we will use PythonPickle by default.
It can make users more quickly run a flyte workflow on Kubernetes without handling the type that flyte doesn't support.
Here is an example without type annotations, and feel free to try it.
Workflow input needs type annotations to make Flyte console work. Otherwise, Flyte Console will expect the user to have a FlyteFile input.
Type
Are all requirements met?
Still work in progress, will add test after I finish
Complete description
How did you fix the bug, make the feature etc. Link to any design docs etc
Tracking Issue
flyteorg/flyte#1362
Follow-up issue
NA