[HybridParallel]Add Recompute for PipeLineParallel #34607

ForFishes · 2021-08-04T08:31:00Z

PR types

New features

PR changes

Others

Describe

Add Recompute for PipeLineParallel

1、接口形式

class PipelineLayer(Layer):
    def __init__(self,
                 layers,
                 num_stages=None,
                 topology=None,
                 loss_fn=None,
                 seg_method="uniform",
                 recompute_interval=0,
                 recompute_offload=False,
                 recompute_partition=False):

2、功能支持

相比paddle原生的recompute，有以下几处不同：

支持PipeLineParallel并行，通过修改recompute的输入形式进行适配。
支持offload功能。
支持MP模式下，checkpoint的进一步裁剪，减少显存占用。
适配混合并行下，随机性控制。

3、性能对比

GPT-117M模型，V100-32G, FP32，MP=4, PP=2， mircrobatch=2， global_batch_size=128,中间卡显存

实验配置	显存占用(M)	性能速度(tokens/s)
w/o recompute	4999	54584
recompute	3531	43251
recompute + offload	3489	22529
recompute + MP切分	3473	35842
recompute + offload + MP切分	3495	28078

?? recompute + offload + MP切分的组合显存相比更大？
nvidia-smi显示的显存，可以已经释放但被paddle缓存住了。

4、精度对比
在GPT-117M，MP2_PP2下验证精度

DP2_MP2_PP2 + AMP

5、TODO

减少显存碎片

paddle-bot-old · 2021-08-04T08:31:04Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

JZ-LIANG · 2021-08-05T12:36:28Z

python/paddle/distributed/fleet/meta_parallel/pp_utils/utils.py

+                    ctx.tensor_shapes.append(arg.shape)
+                    partition = _split_activation(arg.detach()).clone()
+                    # TODO(shenliang03) not use calculate stream to D2H to speed
+                    arg = partition.cpu() if _recompute_offload else partition


Offload in dygraph is Sooooo easy！！！ lol

I think the fleet.utils.recompute could do in the same way

yes, currently support hybrid_parallel first.

JZ-LIANG · 2021-08-05T12:42:21Z

python/paddle/distributed/fleet/meta_parallel/pp_utils/utils.py

+                        tensor_shapes[i])
+                    tensors[i].stop_gradient = state
+                inputs[idx] = tensors[i].cuda(
+                    device_id) if _recompute_offload else tensors[i]


should sync here ？

wait the H2D copy finish before conduct the following computation

cpu() is sync operation, we don't need do this

JZ-LIANG

LGTM

JZ-LIANG

LGTM

zhiqiu

LGTM for op_function_generator

ForFishes force-pushed the add_recompute_for_hybrid branch from f28cc8a to ae6ac75 Compare August 5, 2021 03:06

JZ-LIANG reviewed Aug 5, 2021

View reviewed changes

JZ-LIANG previously approved these changes Aug 5, 2021

View reviewed changes

ForFishes dismissed JZ-LIANG’s stale review via 88ee4ad August 9, 2021 03:01

ForFishes force-pushed the add_recompute_for_hybrid branch from f8eedf2 to 88ee4ad Compare August 9, 2021 03:01

ForFishes added 11 commits August 11, 2021 14:16

add recompute for pp

e63587d

add recompute

b0d080a

add recompute offload

4e9882e

add recompute partition

5ebb2cc

fix p2p

64b92e6

add utest for recompute

a90591c

fix conflict

3a21f4e

fix bug

e37672d

fix bug

a88de33

fix recompute

8b177af

add utest

5e50e53

ForFishes force-pushed the add_recompute_for_hybrid branch from 88ee4ad to 5e50e53 Compare August 11, 2021 09:02

fix conflict

6f4fed2

JZ-LIANG approved these changes Aug 12, 2021

View reviewed changes

zhiqiu approved these changes Aug 12, 2021

View reviewed changes

ForFishes merged commit 589d13c into PaddlePaddle:develop Aug 12, 2021

ForFishes deleted the add_recompute_for_hybrid branch August 12, 2021 03:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HybridParallel]Add Recompute for PipeLineParallel #34607

[HybridParallel]Add Recompute for PipeLineParallel #34607

ForFishes commented Aug 4, 2021 •

edited

Loading

paddle-bot-old bot commented Aug 4, 2021

JZ-LIANG Aug 5, 2021

JZ-LIANG Aug 5, 2021

ForFishes Aug 11, 2021 •

edited

Loading

JZ-LIANG Aug 5, 2021

JZ-LIANG Aug 5, 2021

ForFishes Aug 11, 2021

JZ-LIANG left a comment

JZ-LIANG left a comment

zhiqiu left a comment

[HybridParallel]Add Recompute for PipeLineParallel #34607

[HybridParallel]Add Recompute for PipeLineParallel #34607

Conversation

ForFishes commented Aug 4, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Aug 4, 2021

JZ-LIANG Aug 5, 2021

Choose a reason for hiding this comment

JZ-LIANG Aug 5, 2021

Choose a reason for hiding this comment

ForFishes Aug 11, 2021 • edited Loading

Choose a reason for hiding this comment

JZ-LIANG Aug 5, 2021

Choose a reason for hiding this comment

JZ-LIANG Aug 5, 2021

Choose a reason for hiding this comment

ForFishes Aug 11, 2021

Choose a reason for hiding this comment

JZ-LIANG left a comment

Choose a reason for hiding this comment

JZ-LIANG left a comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

ForFishes commented Aug 4, 2021 •

edited

Loading

ForFishes Aug 11, 2021 •

edited

Loading