-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core feature] Add PodTemplate configuration to flytekit task definitions #3123
Comments
I really like this proposal since it would solve a problem I currently face: Our ML engineers would like to have the ability to switch between T4, A100, and V100 GPUs. The resource name is the same for every GPU type on GKE. Instead, GKE adds a label to the nodes. One could use a Pod task configured with a node selector. However, this only works as a replacement for Python tasks whereas we need this to work for Pytorch tasks. I had the idea to expose not the pod template but the name of the pod template in the task decorator: @task(
request=Resources(),
pod_template_name="v100_pod_template",
) I feel this would align well with Flyte's current approach where
Non ops-savvy ML engineers might not like the UX of having to specify a complete pod template. As the MLOps engineer (or a K8s savvy ML engineer) who runs the platform for them, however, I don't mind maintaining a set of pod templates for them that are managed with the rest of the infrastructure as IaC. This would also bloat the task definition less. |
I need the same (especially |
About exposing pod template or pod template name in the task decorator: of course only one would be enough. Exposing only the name bloats the decorator less in my opinion but in case the other solution is preferred by others, I'm just as happy about this one! |
I vote for exposing the pod spec directly, otherwise it always needs an admin to create an appropriate pod template in the cluster first and the users cannot easily inspect those. |
We had a chance to discuss this at some length today and decided to move forward supporting both approaches. So the configuration value can be either a
|
This is perfect, thanks 🙏 Since in this PR we apply the Pod template to kf operator task pods, will the pod template exposed in the task decorator also work for those task types since it will be a replacement for the default pod template? (Our ML engineers cannot choose the GPU type for Pytorch tasks currently since they cannot configure a node selector. For Python tasks we work around this with Pod tasks.) |
Absolutely, I have not had a chance yet to figure out how this will work in the backend, but presumably we can overload the LoadOrDefault with some task metadata (containing the |
@hamersaw any updates here? |
@flixr thanks for checking in on this! Sorry for slow response, as you might imagine many of us took some time to unwind over the holidays. We just discussed our Q1 plan and this is a very high priority item. I'm expecting to address this in the next few weeks. |
👍 let me know if I can help out with testing or so... |
Is this issue related to #3241 ? |
By exposing the complete PodSpec, user would have control over (for example) the container name for a particular task? By reading the docs I understand that the container name in the PodTemplate will be overridden by Flyte. This is in regards to @stephen37's question here |
@xshen8888 this is indeed related and will address your problem. Though passing in the CLI is still hard. I have an eventual goal of making the entire configuration externally configurable at pyflyte run time - maybe some brave soul can help :D |
I will work on the flytekit part. Thank @wild-endeavor for offering this chance. |
@eapolinario to write "proto-docs" for this feature in flytesnacks somewhere |
@cosmicBboy to make issues for Spark, Ray, Dask issues |
Do |
@lynxoid I do not believe so, but adding support should be a trivial add to flytekit since shell tasks are basically a lighweight wrapper. cc @wild-endeavor @eapolinario can you confirm this? If so, it would be a great contribution! |
@hamersaw ShellTasks inherits from PythonInstanceTask, and then PythonAutoContainerTask. |
This feature has been integrated into the SDK and backend. To simplify support going forward, let's get bugs and feature requests as separate github issues. |
This issue is not yet fully complete. Waiting on final integration into the dask plugin |
Motivation: Why do you think this is important?
Many users require more complex k8s configuration within the Flyte ecosystem. This may include specifying task-specific tolerations, volume mounts, etc. Currently this is unsupported within flytekit, where the API favors ergonomics in abstracting away the complexities of k8s configuration from end-users. However, some use-cases require this complex configuration and striking a balance is very difficult.
Goal: What should the final outcome look like, ideally?
Users should be able to configure every aspect of the k8s Pod that Flyte creates without bloating the task definition.
One solution is to add a PodTemplate configuration to every flytekit task definition. This would work similarly to the default PodTemplate scheme where this PodTemplate definition serves as the base in creating the k8s Pod. By default this will be empty and therefore not applied, but if it is specified it will serve as the base for Pod configuration. This API could look something like:
Describe alternatives you've considered
Currently, proposals often include one-off configuration updates that solve a specific use-case but are not applicable in generalized terms. This approach was used in the k8s plugin and has proved unmaintainable in that every option requires code changes which may change between k8s versions.
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: