-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use non_reentrant_checkpoint
fix requires_grad of input must be true for activation checkpoint layer in pipeline train.
#4224
Conversation
…af forward tensor refs
* Pass correct node size * formatting --------- Co-authored-by: Connor Holmes <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>
* add deepspeed chat arxiv report * add zeroquant v2 and fp * add selective enhencement * add ignore for 'Youn' in spell checker --------- Co-authored-by: yaozhewei <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>
…use and add regression tests
@inkcherry Thank you for submitting this as a new PR! Sorry for the delay of my review. |
Hi ,@tohtana |
Hi @inkcherry, I could reproduced the same error (AssertionError on line 109). How did you run the test? I ran the following: pytest test_pipe.py::TestPipeCifar10::test_pipe_use_reentrant[topo_config0] |
Hi, @tohtana, very Thanks for helping me find the reason~ seems my testing environment version is a bit higher than ci.
I have reduced the check level from params equal to loss convergence and can be passed currently. params equal check could be reused when the version is updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @inkcherry, the changes look good to me and I approved this PR.
It is not surprising that the updated parameters do not exactly match without setting options for reproducibility.
Let's merge this PR after the CI tests completed.
from original PR
#4128 rebase on feature proposed in
#4118. should be reviewed after #4118 merged.
@tohtana @tjruwase @hughpu