PPO Not Working with DeepSpeed stage ZeRO-3 #3108

markelausin · 2024-04-02T22:07:25Z

Reminder

I have read the README and searched the existing issues.

Reproduction

Generate() step is failing during PPO with LLaMA 70B + LoRA. I'm using DeepSpeed ZeRO-3 and I've tried with and without offloading, and with and without grad accumulation. Is the model not being unwrapped correctly? When I print the state dictionary of the unwrapped model (and also unwrapped_model.pretrained_model.state_dict()), I get that the following tensor, which is not 2D: ('base_model.model.model.embed_tokens.weight', tensor([], device='cuda:7', dtype=torch.bfloat16)). This probably indicates the that embed_tokens weight is being split across multiple GPUs with ZeRO stage 3. Is there a way to fix this?

Here's the model config

exp_args="--stage ppo \
--model_name_or_path /path/to/Llama2-70b \
--adapter_name_or_path /path/to/sft/lora/adapter \
--create_new_adapter \
--reward_model /path/to/reward_model/adapter \
--output_dir /path/to/output/dir \
--overwrite_output_dir \
--finetuning_type lora \
--lora_target q_proj,v_proj \
--save_steps 0.1 \
--eval_steps 0.1 \
--save_strategy steps \
--warmup_steps 1 \
--top_k 0 \
--top_p 0.9 \
--print_param_status \
--rope_scaling linear \
--evaluation_strategy steps \
--gradient_accumulation_steps 2 \
--learning_rate 1e-5 \
--num_train_epochs 1"

Expected behavior

No response

System Info

version v0.6.1

Others

Traceback (most recent call last):
  File "/lustre/pretraining-checkpoints/pre-training-model-checkpoint/markel/deepspeed/rlhf/LLaMA-Factory/src/train_bash.py", line 14, in <module>
    main()
  File "/lustre/pretraining-checkpoints/pre-training-model-checkpoint/markel/deepspeed/rlhf/LLaMA-Factory/src/train_bash.py", line 5, in main
    run_exp()
  File "/lustre/pretraining-checkpoints/pre-training-model-checkpoint/markel/deepspeed/rlhf/LLaMA-Factory/src/llmtuner/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/lustre/pretraining-checkpoints/pre-training-model-checkpoint/markel/deepspeed/rlhf/LLaMA-Factory/src/llmtuner/train/ppo/workflow.py", line 60, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/lustre/pretraining-checkpoints/pre-training-model-checkpoint/markel/deepspeed/rlhf/LLaMA-Factory/src/llmtuner/train/ppo/trainer.py", line 194, in ppo_train
    mini_batch_queries, mini_batch_responses = self.get_inputs(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/lustre/pretraining-checkpoints/pre-training-model-checkpoint/markel/deepspeed/rlhf/LLaMA-Factory/src/llmtuner/train/ppo/trainer.py", line 311, in get_inputs
    generate_output: torch.Tensor = unwrapped_model.generate(
  File "/usr/local/lib/python3.10/dist-packages/trl/models/modeling_value_head.py", line 203, in generate
    return self.pretrained_model.generate(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1190, in generate
    outputs = self.base_model.generate(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1575, in generate
    result = self._sample(
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2697, in _sample
    outputs = self(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1196, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 972, in forward
    inputs_embeds = self.embed_tokens(input_ids)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/sparse.py", line 162, in forward
    return F.embedding(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: 'weight' must be 2-D

The text was updated successfully, but these errors were encountered:

butujvzipi · 2024-04-26T02:46:32Z

这个问题新版框架还存在，什么时候可以解决呀

butujvzipi · 2024-04-26T02:46:40Z

@hiyouga

Ricardokevins · 2024-05-06T06:49:33Z

Any update on this issue @hiyouga QwQ

yukiwayx · 2024-06-05T03:28:13Z

Same problem

rahul1921 · 2024-06-06T05:16:14Z

same problem with

Supervised Fine-Tuning with DeepSpeed ZeRO-3 (Weight Sharding)

CUDA_VISIBLE_DEVICES=0,1,2,3 llamafactory-cli train examples/lora_multi_gpu/llama3_lora_sft_ds.yaml

hiyouga · 2024-06-06T15:30:33Z

fixed

ldknight · 2024-07-04T01:29:50Z

@hiyouga hi，can you tell me how you solved the problem? Thanks!

hiyouga added the pending This problem is yet to be addressed label Apr 3, 2024

hiyouga added a commit that referenced this issue Jun 6, 2024

fix ppo+zero3 #3108

76c6190

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jun 6, 2024

hiyouga closed this as completed Jun 6, 2024

ldknight mentioned this issue Jul 8, 2024

About “RuntimeError: 'weight' must be 2-D” #4718

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO Not Working with DeepSpeed stage ZeRO-3 #3108

PPO Not Working with DeepSpeed stage ZeRO-3 #3108

markelausin commented Apr 2, 2024

butujvzipi commented Apr 26, 2024

butujvzipi commented Apr 26, 2024

Ricardokevins commented May 6, 2024

yukiwayx commented Jun 5, 2024

rahul1921 commented Jun 6, 2024

hiyouga commented Jun 6, 2024

ldknight commented Jul 4, 2024

PPO Not Working with DeepSpeed stage ZeRO-3 #3108

PPO Not Working with DeepSpeed stage ZeRO-3 #3108

Comments

markelausin commented Apr 2, 2024

Reminder

Reproduction

Expected behavior

System Info

Others

butujvzipi commented Apr 26, 2024

butujvzipi commented Apr 26, 2024

Ricardokevins commented May 6, 2024

yukiwayx commented Jun 5, 2024

rahul1921 commented Jun 6, 2024

Supervised Fine-Tuning with DeepSpeed ZeRO-3 (Weight Sharding)

hiyouga commented Jun 6, 2024

ldknight commented Jul 4, 2024