Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hybrid] pp+dp support fp16 allreduce #34762

Merged
merged 4 commits into from
Aug 11, 2021

Conversation

wangxicoding
Copy link
Contributor

@wangxicoding wangxicoding commented Aug 10, 2021

PR types

Performance optimization

PR changes

Others

Describe

Hybrid support fp16 allreduce. Usage:

import paddle.distributed.fleet as fleet

strategy = fleet.DistributedStrategy()
strategy.sharding = True
strategy.sharding_configs = {
    "sharding_degree": 1,
    "mp_degree": 1,
    "pp_degree": 2,
    "dp_degree": 2,
}
strategy.pipeline = True
strategy.pipeline_configs = {
    "schedule_mode": "1F1B",
    "micro_batch_size": 2,
    "accumulate_steps": 4,
}
strategy.amp = True
strategy.fp16_allreduce = True

Test

Test in 16node*8cards 32G V100, with Ernie3.0 model.

Model config:

value
hidden size 8192
num attention heads 128
num hidden layers 76
num sharing layers 64
branch hidden size 768
branch num attention heads 16

Hybrid configs, with fused_allreduce 128MB:

dp mp pp micro bsz global bsz
2 8 8 2 256

Performance:

fp16_allreduce throughput(tokens/s) improve
false 13394
true 15224 13.6%

Copy link
Contributor

@JZ-LIANG JZ-LIANG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

"""
Create a new merged gradient for each parameter and accumulate the
corresponding gradient to it.
"""
merged_gradient_names = []
first_opt_op_idx = None

merged_suffix = '@MERGED@FP16' if fp16_allreduce else '@MERGED'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should explain the suffix for gradname for later maintainer, we now have two many suffix for grad, immediately grad, accumulated grad, casted grad, etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, add in next PR

Copy link

@sandyhouse sandyhouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangxicoding wangxicoding merged commit 4d7af37 into PaddlePaddle:develop Aug 11, 2021
@wangxicoding wangxicoding deleted the hybird_fp16_allreduce branch August 11, 2021 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants