-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hybrid] pp+dp support fp16 allreduce #34762
[hybrid] pp+dp support fp16 allreduce #34762
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
""" | ||
Create a new merged gradient for each parameter and accumulate the | ||
corresponding gradient to it. | ||
""" | ||
merged_gradient_names = [] | ||
first_opt_op_idx = None | ||
|
||
merged_suffix = '@MERGED@FP16' if fp16_allreduce else '@MERGED' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should explain the suffix for gradname for later maintainer, we now have two many suffix for grad, immediately grad, accumulated grad, casted grad, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, add in next PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Performance optimization
PR changes
Others
Describe
Hybrid support fp16 allreduce. Usage:
Test
Test in 16node*8cards 32G V100, with Ernie3.0 model.
Model config:
Hybrid configs, with fused_allreduce 128MB:
Performance: