This repository has been archived by the owner on Oct 9, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 53
Feat: Configure elastic training in pytorch plugin #343
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8 tasks
Signed-off-by: Fabio Grätz <[email protected]>
Signed-off-by: Fabio Grätz <[email protected]>
fg91
force-pushed
the
fabio/feat/torch-elastic-plugin
branch
from
April 22, 2023 18:03
e1a5a7e
to
b644427
Compare
2 tasks
Tests are failing since flyteidl needs to be updated first. |
Signed-off-by: Ketan Umare <[email protected]>
Signed-off-by: Ketan Umare <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #343 +/- ##
==========================================
+ Coverage 62.64% 64.07% +1.43%
==========================================
Files 148 148
Lines 12397 10072 -2325
==========================================
- Hits 7766 6454 -1312
+ Misses 4036 3023 -1013
Partials 595 595
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 130 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Signed-off-by: Ketan Umare <[email protected]>
kumare3
approved these changes
Apr 24, 2023
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TL;DR
This PR modifies the pytorch plugin so that it can set an
ElasticPolicy
in the kubeflowPytorchJob
in case a user configures torch elastic training (torchrun) in the task decorator:See this issue for motivation and more details.
Type
Are all requirements met?
Complete description
Tracking Issue
Fixes flyteorg/flyte#3614
Follow-up issue