Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] ArrayNode #3446

Merged
merged 3 commits into from
May 4, 2023
Merged

[RFC] ArrayNode #3446

merged 3 commits into from
May 4, 2023

Conversation

hamersaw
Copy link
Contributor

Tracking issue

#1131

Describe your changes

Introduce ArrayNodes for a functionally complete MapTask implementation.

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Screenshots

NA

Note to reviewers

NA

Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
@davidmirror-ops davidmirror-ops added the rfc A label for RFC issues label Mar 29, 2023
bstadlbauer
bstadlbauer previously approved these changes Mar 30, 2023
Copy link
Member

@bstadlbauer bstadlbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have too much context on the inner workings of the existing k8s_array plugin or the exact inner workings of node traversals within flytepropeller.
This does seem really nice from a user perspective, and I have experienced a lot of the feature-imparity of the existing array_task.

Also happy to help implement this in case there are bits I could help with. Would be nice to get a better understanding of the flytepropeller implementation.

rfc/system/3346-array-node.md Show resolved Hide resolved
rfc/system/3346-array-node.md Outdated Show resolved Hide resolved
rfc/system/3346-array-node.md Outdated Show resolved Hide resolved
rfc/system/3346-array-node.md Outdated Show resolved Hide resolved
fg91
fg91 previously approved these changes Mar 30, 2023
Copy link
Member

@fg91 fg91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me many of the details about the inner workings of map tasks were new and I feel I cannot really judge the performance implications of the proposed changes.

However, the following points listed in the proposal are very compelling in my eyes, making this a well-worth endeavour:

  • Support mapping over non-K8s tasks
  • Cache would be functionally complete
  • Support for intra task checkpointing
  • Recoverability
  • Multiple input values

Signed-off-by: Daniel Rammer <[email protected]>
@hamersaw hamersaw dismissed stale reviews from fg91 and bstadlbauer via 4c33ed4 April 3, 2023 19:20
@hamersaw hamersaw requested review from fg91 and bstadlbauer May 3, 2023 19:19
- Storing separate inputs in the blobstore. This is very inefficient and should be used as a last resort.
- Some other fancy solution we hack together in the heat of the moment.

#### flytekit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not in scope for this RFC, but wanted to ask: will this proposal also support inputs that are "mappable", e.g. StructuredDataset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC that is entirely a flytekit construct and shouldn't need any backend work. So outside the scope of this proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc A label for RFC issues
Projects
Status: Implemented
Development

Successfully merging this pull request may close these issues.

7 participants