-
Notifications
You must be signed in to change notification settings - Fork 672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] ArrayNode #3446
[RFC] ArrayNode #3446
Conversation
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have too much context on the inner workings of the existing k8s_array
plugin or the exact inner workings of node traversals within flytepropeller
.
This does seem really nice from a user perspective, and I have experienced a lot of the feature-imparity of the existing array_task
.
Also happy to help implement this in case there are bits I could help with. Would be nice to get a better understanding of the flytepropeller
implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me many of the details about the inner workings of map tasks were new and I feel I cannot really judge the performance implications of the proposed changes.
However, the following points listed in the proposal are very compelling in my eyes, making this a well-worth endeavour:
- Support mapping over non-K8s tasks
- Cache would be functionally complete
- Support for intra task checkpointing
- Recoverability
- Multiple input values
Signed-off-by: Daniel Rammer <[email protected]>
- Storing separate inputs in the blobstore. This is very inefficient and should be used as a last resort. | ||
- Some other fancy solution we hack together in the heat of the moment. | ||
|
||
#### flytekit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not in scope for this RFC, but wanted to ask: will this proposal also support inputs that are "mappable", e.g. StructuredDataset
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC that is entirely a flytekit construct and shouldn't need any backend work. So outside the scope of this proposal.
Tracking issue
#1131
Describe your changes
Introduce ArrayNodes for a functionally complete MapTask implementation.
Check all the applicable boxes
Screenshots
NA
Note to reviewers
NA