Raise error when using pissa method with deepspeed #4579

hzhaoy · 2024-06-27T05:47:08Z

Reminder

I have read the README and searched the existing issues.

System Info

System: Ubuntu 20.04.2 LTS
GPU: NVIDIA A100-SXM4-80GB
Docker: 24.0.0
Docker Compose: v2.17.3
llamafactory: 0.8.3.dev0

Reproduction

Execute command:

llamafactory-cli train \
    --stage sft \
    --do_train True \
    --model_name_or_path /models/qwen/Qwen2-7B-Instruct \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template qwen \
    --flash_attn auto \
    --dataset_dir data \
    --dataset mydataset \
    --cutoff_len 4096 \
    --learning_rate 5e-05 \
    --num_train_epochs 5.0 \
    --max_samples 2500 \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 100 \
    --optim adamw_torch \
    --packing False \
    --report_to none \
    --output_dir saves/Qwen2-7B-Chat/lora/train_2024-06-27-03-39-28 \
    --bf16 True \
    --plot_loss True \
    --ddp_timeout 180000000 \
    --include_num_input_tokens_seen True \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --pissa_init True \
    --pissa_convert True \
    --lora_target all \
    --deepspeed cache/ds_z2_config.json

Raise error:

llm-fct-webui-1  | [rank1]: Traceback (most recent call last):
llm-fct-webui-1  | [rank1]:   File "/app/src/llamafactory/launcher.py", line 23, in <module>
llm-fct-webui-1  | [rank1]:     launch()
llm-fct-webui-1  | [rank1]:   File "/app/src/llamafactory/launcher.py", line 19, in launch
llm-fct-webui-1  | [rank1]:     run_exp()
llm-fct-webui-1  | [rank1]:   File "/app/src/llamafactory/train/tuner.py", line 50, in run_exp
llm-fct-webui-1  | [rank1]:     run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
llm-fct-webui-1  | [rank1]:   File "/app/src/llamafactory/train/sft/workflow.py", line 72, in run_sft
llm-fct-webui-1  | [rank1]:     trainer = CustomSeq2SeqTrainer(
llm-fct-webui-1  | [rank1]:   File "/app/src/llamafactory/train/sft/trainer.py", line 56, in __init__
llm-fct-webui-1  | [rank1]:     self.save_model(os.path.join(self.args.output_dir, "pissa_init"))
llm-fct-webui-1  | [rank1]:   File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3340, in save_model
llm-fct-webui-1  | [rank1]:     state_dict = self.accelerator.get_state_dict(self.deepspeed)
llm-fct-webui-1  | [rank1]:   File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 3239, in get_state_dict
llm-fct-webui-1  | [rank1]:     if self.deepspeed_config["zero_optimization"]["stage"] == 3:
llm-fct-webui-1  | [rank1]: AttributeError: 'Accelerator' object has no attribute 'deepspeed_config'
llm-fct-webui-1  | 06/26/2024 07:18:36 - INFO - llamafactory.model.loader - trainable params: 20185088 || all params: 7635801600 || trainable%: 0.            2643
llm-fct-webui-1  | load_model complete
llm-fct-webui-1  | [INFO|trainer.py:641] 2024-06-26 07:18:36,439 >> Using auto half precision backend
llm-fct-webui-1  | [rank0]: Traceback (most recent call last):
llm-fct-webui-1  | [rank0]:   File "/app/src/llamafactory/launcher.py", line 23, in <module>
llm-fct-webui-1  | [rank0]:     launch()
llm-fct-webui-1  | [rank0]:   File "/app/src/llamafactory/launcher.py", line 19, in launch
llm-fct-webui-1  | [rank0]:     run_exp()
llm-fct-webui-1  | [rank0]:   File "/app/src/llamafactory/train/tuner.py", line 50, in run_exp
llm-fct-webui-1  | [rank0]:     run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
llm-fct-webui-1  | [rank0]:   File "/app/src/llamafactory/train/sft/workflow.py", line 72, in run_sft
llm-fct-webui-1  | [rank0]:     trainer = CustomSeq2SeqTrainer(
llm-fct-webui-1  | [rank0]:   File "/app/src/llamafactory/train/sft/trainer.py", line 56, in __init__
llm-fct-webui-1  | [rank0]:     self.save_model(os.path.join(self.args.output_dir, "pissa_init"))
llm-fct-webui-1  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3340, in save_model
llm-fct-webui-1  | [rank0]:     state_dict = self.accelerator.get_state_dict(self.deepspeed)
llm-fct-webui-1  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 3239, in get_state_dict
llm-fct-webui-1  | [rank0]:     if self.deepspeed_config["zero_optimization"]["stage"] == 3:
llm-fct-webui-1  | [rank0]: AttributeError: 'Accelerator' object has no attribute 'deepspeed_config'

Expected behavior

Successfully started training

Others

No response

The text was updated successfully, but these errors were encountered:

github-actions bot added the pending This problem is yet to be addressed label Jun 27, 2024

hzhaoy added a commit to hzhaoy/LLaMA-Factory that referenced this issue Jun 27, 2024

fix hiyouga#4579

677c865

hzhaoy mentioned this issue Jun 27, 2024

Fix bug when using pissa method with deepspeed #4580

Merged

2 tasks

hiyouga closed this as completed in #4580 Jun 27, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jun 27, 2024

PrimaLuz pushed a commit to PrimaLuz/LLaMA-Factory that referenced this issue Jul 1, 2024

fix hiyouga#4579

6640379

xtchen96 pushed a commit to xtchen96/LLaMA-Factory that referenced this issue Jul 17, 2024

fix hiyouga#4579

1adf3c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise error when using pissa method with deepspeed #4579

Raise error when using pissa method with deepspeed #4579

hzhaoy commented Jun 27, 2024

Raise error when using pissa method with deepspeed #4579

Raise error when using pissa method with deepspeed #4579

Comments

hzhaoy commented Jun 27, 2024

Reminder

System Info

Reproduction

Expected behavior

Others