Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Change kubeflow plugins to allow settings specs for different replica #345

Merged
merged 11 commits into from
May 9, 2023

Conversation

yubofredwang
Copy link
Contributor

@yubofredwang yubofredwang commented Apr 21, 2023

TL;DR

Change kubeflow plugins to allow settings specs for different replica. The settings include: images, resources, command(mpi only), restart policy for different replica groups.

Corresponding PR in flyteidl: flyteorg/flyteidl#386

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Tracking Issue

fixes flyteorg/flyte#3308

@welcome
Copy link

welcome bot commented Apr 21, 2023

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

@kumare3
Copy link
Contributor

kumare3 commented Apr 21, 2023

cc @fg91 you are also working on this file

@yubofredwang yubofredwang marked this pull request as draft April 21, 2023 17:01
@yubofredwang yubofredwang force-pushed the refactor_kfoperators branch from 3a53e39 to 4da8c4f Compare April 25, 2023 21:44
Signed-off-by: Yubo Wang <[email protected]>
@yubofredwang yubofredwang changed the title [Don't Review] change pytorch plugin to accept new pytorch task idl Change kubeflow plugins to allow settings specs for different replica Apr 27, 2023
Signed-off-by: Yubo Wang <[email protected]>
@yubofredwang yubofredwang force-pushed the refactor_kfoperators branch from 28b5e43 to 9626895 Compare April 27, 2023 07:10
@yubofredwang yubofredwang marked this pull request as ready for review April 27, 2023 07:10
Copy link
Contributor

@hamersaw hamersaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nit - can we add docs to the function headers? It's something we trying to be more strict about.

@codecov
Copy link

codecov bot commented May 8, 2023

Codecov Report

Merging #345 (5729058) into master (76a80ec) will increase coverage by 1.37%.
The diff coverage is 76.92%.

❗ Current head 5729058 differs from pull request most recent head 3b79d77. Consider uploading reports for the commit 3b79d77 to get more accurate results

@@            Coverage Diff             @@
##           master     #345      +/-   ##
==========================================
+ Coverage   62.76%   64.13%   +1.37%     
==========================================
  Files         148      148              
  Lines       12444    10269    -2175     
==========================================
- Hits         7810     6586    -1224     
+ Misses       4038     3074     -964     
- Partials      596      609      +13     
Flag Coverage Δ
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../plugins/k8s/kfoperators/common/common_operator.go 65.94% <33.33%> (-5.99%) ⬇️
go/tasks/plugins/k8s/kfoperators/mpi/mpi.go 75.00% <81.25%> (+0.80%) ⬆️
...s/plugins/k8s/kfoperators/tensorflow/tensorflow.go 77.77% <84.14%> (+3.02%) ⬆️
...o/tasks/plugins/k8s/kfoperators/pytorch/pytorch.go 77.62% <87.80%> (-1.91%) ⬇️

... and 127 files with indirect coverage changes

hamersaw
hamersaw previously approved these changes May 9, 2023
Signed-off-by: Yubo Wang <[email protected]>
@hamersaw hamersaw merged commit 9a2bbba into flyteorg:master May 9, 2023
@welcome
Copy link

welcome bot commented May 9, 2023

Congrats on merging your first pull request! 🎉

@yubofredwang yubofredwang deleted the refactor_kfoperators branch May 9, 2023 23:17
eapolinario pushed a commit that referenced this pull request Sep 6, 2023
…#345)

* change pytorch plugin to accept new pytorch task idl

Signed-off-by: Yubo Wang <[email protected]>

* merge elastic config in

Signed-off-by: Yubo Wang <[email protected]>

* add unit tests for pytorch

Signed-off-by: Yubo Wang <[email protected]>

* add tfjob

Signed-off-by: Yubo Wang <[email protected]>

* add mpi job

Signed-off-by: Yubo Wang <[email protected]>

* add test to commone operator

Signed-off-by: Yubo Wang <[email protected]>

* update flyteidl

Signed-off-by: Yubo Wang <[email protected]>

* add function header comments

Signed-off-by: Yubo Wang <[email protected]>

* fix lint

Signed-off-by: Yubo Wang <[email protected]>

---------

Signed-off-by: Yubo Wang <[email protected]>
Co-authored-by: Yubo Wang <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Core feature] Allow setting separate resource configs for different replica types in Kubeflow Jobs
3 participants