-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Pipeline Parallelism cannot work with BFloat16 + Optimizer offload #3866
Comments
@SparkJiao, this is correct. Offloading is not enabled for pipeline parallelism. |
I see. Thanks. But is there will be any unexpected behaviour when enabling optimizer offload in pipeline parallelism? Since I didn't notice anything wrong for my current training (fp16 + optimizer offload on LLaMA-65B). Also I'm sure the possible risk why optimizer offload is designed not to be support in pipeline parallel. |
@SparkJiao, sorry for the confusion. I was trying to say the bf16_optimizer.py implementation does not include offloading feature. We have not previously tested offloading and pipeline parallelism, so I don't know if there are any issues. However, your results are promising that these two can be combined. Can you share more details of your results and what you are trying to do? Do note that the bf16_optimizer.py implementation does not include offloading at all. |
@tjruwase Thanks for your reply. Currently I can successfully complete the training procedure with optimizer offload and gradient checkpointing with fp16. I just wanted to use bf16 to in the above procedure. I have created a repo for my implementation of using DeepSpeed pipeline parallelism training so you may check it here. I think offload is necessary when you have only 8 * 80G A100 to train LLaMA-65B. (I have tried 8 stages pipeline parallel with 2 group of data parallel, so totally 16 GPU cards, but failed when optimizer offload is disabled). |
@SparkJiao, can you share simple steps to repro the failure you are seeing with bf16 + offload? Thanks! |
When combining bfloat16 with optimizer offload, I get the following error:
The deepspeed config is as follows:
And when I remove
offload_optimizer
subconfig, the training keeps normal. Also, when using fp16 + optimizer offload, the procedure is also normal.The text was updated successfully, but these errors were encountered: