-
Notifications
You must be signed in to change notification settings - Fork 53
feat: Dask add pod template support #374
feat: Dask add pod template support #374
Conversation
Signed-off-by: Bernhard Stadlbauer <[email protected]>
Signed-off-by: Bernhard Stadlbauer <[email protected]>
Signed-off-by: Bernhard Stadlbauer <[email protected]>
Signed-off-by: Bernhard Stadlbauer <[email protected]>
Signed-off-by: Bernhard Stadlbauer <[email protected]>
go/tasks/plugins/k8s/dask/dask.go
Outdated
func removeInterruptibleConfig(spec *v1.PodSpec, taskCtx pluginsCore.TaskExecutionContext) { | ||
if !taskCtx.TaskExecutionMetadata().IsInterruptible() { | ||
return | ||
} | ||
|
||
jobRunnerContainer := v1.Container{ | ||
Name: "job-runner", | ||
Image: defaultImage, | ||
Args: defaultContainerSpec.GetArgs(), | ||
Env: defaultEnvVars, | ||
Resources: *containerResources, | ||
// Tolerations | ||
interruptlibleTolerations := config.GetK8sPluginConfig().InterruptibleTolerations | ||
newTolerations := []v1.Toleration{} | ||
for _, toleration := range spec.Tolerations { | ||
if !slices.Contains(interruptlibleTolerations, toleration) { | ||
newTolerations = append(newTolerations, toleration) | ||
} | ||
} | ||
spec.Tolerations = newTolerations | ||
|
||
templateParameters := template.Parameters{ | ||
TaskExecMetadata: taskCtx.TaskExecutionMetadata(), | ||
Inputs: taskCtx.InputReader(), | ||
OutputPath: taskCtx.OutputWriter(), | ||
Task: taskCtx.TaskReader(), | ||
// Node selectors | ||
interruptibleNodeSelector := config.GetK8sPluginConfig().InterruptibleNodeSelector | ||
for key := range spec.NodeSelector { | ||
if _, ok := interruptibleNodeSelector[key]; ok { | ||
delete(spec.NodeSelector, key) | ||
} | ||
} | ||
if err = flytek8s.AddFlyteCustomizationsToContainer(ctx, templateParameters, | ||
flytek8s.ResourceCustomizationModeMergeExistingResources, &jobRunnerContainer); err != nil { | ||
|
||
return nil, err | ||
// Node selector requirements | ||
interruptibleNodeSelectorRequirements := config.GetK8sPluginConfig().InterruptibleNodeSelectorRequirement | ||
nodeSelectorTerms := spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms | ||
for i := range nodeSelectorTerms { | ||
nst := &nodeSelectorTerms[i] | ||
matchExpressions := nst.MatchExpressions | ||
newMatchExpressions := []v1.NodeSelectorRequirement{} | ||
for _, matchExpression := range matchExpressions { | ||
if !nodeSelectorRequirementsAreEqual(matchExpression, *interruptibleNodeSelectorRequirements) { | ||
newMatchExpressions = append(newMatchExpressions, matchExpression) | ||
} | ||
} | ||
nst.MatchExpressions = newMatchExpressions | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hamersaw I'm not too keen on this removal function, happy to change ToK8sPodSpec
to enable not adding the config in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative solution would be to create a wrapper over the TaskExecutionContext
/ TaskExecutionMetadata
which forcesinterruptible
to be false. This would stop them from ever being injected without adding more parameters. Something like:
struct daskTaskExecutionContext {
taskExecutionContext
metadata TaskExecutionMetadata
}
func (d *daskTaskExecutionContext) GetMetadata() TaskExecutionMetadata{
return metadata
}
type daskTaskExecutionMetadata {
taskExecutionMetadata
}
func (d *daskTaskExecutionMetadata) IsInterruptible() bool {
return falsee
}
the function and type names are all wrong (don't know off the top of my head), but basically just forcing interruptible to always be false for dask tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me - I'll try to implement that!
@hamersaw I'm done with the work on this, but haven't smoke tested this in a local cluster yet, should hopefully get to that tomorrow. The test and lint failures are a bit puzzling to me as both pass locally, and it seems like they flag something that's not even in the code (e.g. |
@bstadlbauer have you had a chance to test this locally yet? Everything looks good to me, but always nice to double-check. As far as the CI checks, I'm not quite sure what's going on. Will plan on addressing as we iterate on this. |
Signed-off-by: Bernhard Stadlbauer <[email protected]>
Signed-off-by: Bernhard Stadlbauer <[email protected]>
Signed-off-by: Bernhard Stadlbauer <[email protected]>
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #374 +/- ##
==========================================
+ Coverage 62.91% 64.10% +1.19%
==========================================
Files 156 156
Lines 13154 10654 -2500
==========================================
- Hits 8276 6830 -1446
+ Misses 4257 3200 -1057
- Partials 621 624 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
@hamersaw This would be ready from my end now - again sorry for the incredibly long delay here. I've swapped the In addition, I've written a small end-to-end test suite in https://github.com/bstadlbauer/flyte-dev-setup which can automatically end-to-end test the plugin which should hopefully decrease the iteration time on future changes as I was mostly blocked by testing this. I've also found a small bug (an incorrect restart policy due to a dask kubernetes update) which I've fixed along the way.
the output |
* Add failing test Signed-off-by: Bernhard Stadlbauer <[email protected]> * WIP Signed-off-by: Bernhard Stadlbauer <[email protected]> * Improve test Signed-off-by: Bernhard Stadlbauer <[email protected]> * Refactor to use `ToK8sPodSpec` Signed-off-by: Bernhard Stadlbauer <[email protected]> * Fix linting Signed-off-by: Bernhard Stadlbauer <[email protected]> * Use `Always` restart policy for workers Signed-off-by: Bernhard Stadlbauer <[email protected]> * Add test which checks whether labels are propagated Signed-off-by: Bernhard Stadlbauer <[email protected]> * Replace `removeInterruptibleConfig` with `TaskExectuionMetadata` wrapper Signed-off-by: Bernhard Stadlbauer <[email protected]> --------- Signed-off-by: Bernhard Stadlbauer <[email protected]>
TL;DR
This PR change the
dask
plugin to useToK8sPodSpec
to create the basis of the pod template to be used for alldask
components. This is coherent with other plugins (such as theray
plugin) and enables the use of common features such as pod templates.Type
Are all requirements met?
Complete description
This has been pretty straight forward. The only downside was that I had to remove the interruptible config from the pod spec after it has been added, as I couldn't see a way of not adding it in
ToK8sPodSpec
.