Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise error when using pissa method with deepspeed #4579

Closed
1 task done
hzhaoy opened this issue Jun 27, 2024 · 0 comments · Fixed by #4580
Closed
1 task done

Raise error when using pissa method with deepspeed #4579

hzhaoy opened this issue Jun 27, 2024 · 0 comments · Fixed by #4580
Labels
solved This problem has been already solved

Comments

@hzhaoy
Copy link
Contributor

hzhaoy commented Jun 27, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

System: Ubuntu 20.04.2 LTS
GPU: NVIDIA A100-SXM4-80GB
Docker: 24.0.0
Docker Compose: v2.17.3
llamafactory: 0.8.3.dev0

Reproduction

Execute command:

llamafactory-cli train \
    --stage sft \
    --do_train True \
    --model_name_or_path /models/qwen/Qwen2-7B-Instruct \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template qwen \
    --flash_attn auto \
    --dataset_dir data \
    --dataset mydataset \
    --cutoff_len 4096 \
    --learning_rate 5e-05 \
    --num_train_epochs 5.0 \
    --max_samples 2500 \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 100 \
    --optim adamw_torch \
    --packing False \
    --report_to none \
    --output_dir saves/Qwen2-7B-Chat/lora/train_2024-06-27-03-39-28 \
    --bf16 True \
    --plot_loss True \
    --ddp_timeout 180000000 \
    --include_num_input_tokens_seen True \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --pissa_init True \
    --pissa_convert True \
    --lora_target all \
    --deepspeed cache/ds_z2_config.json

Raise error:

llm-fct-webui-1  | [rank1]: Traceback (most recent call last):
llm-fct-webui-1  | [rank1]:   File "/app/src/llamafactory/launcher.py", line 23, in <module>
llm-fct-webui-1  | [rank1]:     launch()
llm-fct-webui-1  | [rank1]:   File "/app/src/llamafactory/launcher.py", line 19, in launch
llm-fct-webui-1  | [rank1]:     run_exp()
llm-fct-webui-1  | [rank1]:   File "/app/src/llamafactory/train/tuner.py", line 50, in run_exp
llm-fct-webui-1  | [rank1]:     run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
llm-fct-webui-1  | [rank1]:   File "/app/src/llamafactory/train/sft/workflow.py", line 72, in run_sft
llm-fct-webui-1  | [rank1]:     trainer = CustomSeq2SeqTrainer(
llm-fct-webui-1  | [rank1]:   File "/app/src/llamafactory/train/sft/trainer.py", line 56, in __init__
llm-fct-webui-1  | [rank1]:     self.save_model(os.path.join(self.args.output_dir, "pissa_init"))
llm-fct-webui-1  | [rank1]:   File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3340, in save_model
llm-fct-webui-1  | [rank1]:     state_dict = self.accelerator.get_state_dict(self.deepspeed)
llm-fct-webui-1  | [rank1]:   File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 3239, in get_state_dict
llm-fct-webui-1  | [rank1]:     if self.deepspeed_config["zero_optimization"]["stage"] == 3:
llm-fct-webui-1  | [rank1]: AttributeError: 'Accelerator' object has no attribute 'deepspeed_config'
llm-fct-webui-1  | 06/26/2024 07:18:36 - INFO - llamafactory.model.loader - trainable params: 20185088 || all params: 7635801600 || trainable%: 0.            2643
llm-fct-webui-1  | load_model complete
llm-fct-webui-1  | [INFO|trainer.py:641] 2024-06-26 07:18:36,439 >> Using auto half precision backend
llm-fct-webui-1  | [rank0]: Traceback (most recent call last):
llm-fct-webui-1  | [rank0]:   File "/app/src/llamafactory/launcher.py", line 23, in <module>
llm-fct-webui-1  | [rank0]:     launch()
llm-fct-webui-1  | [rank0]:   File "/app/src/llamafactory/launcher.py", line 19, in launch
llm-fct-webui-1  | [rank0]:     run_exp()
llm-fct-webui-1  | [rank0]:   File "/app/src/llamafactory/train/tuner.py", line 50, in run_exp
llm-fct-webui-1  | [rank0]:     run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
llm-fct-webui-1  | [rank0]:   File "/app/src/llamafactory/train/sft/workflow.py", line 72, in run_sft
llm-fct-webui-1  | [rank0]:     trainer = CustomSeq2SeqTrainer(
llm-fct-webui-1  | [rank0]:   File "/app/src/llamafactory/train/sft/trainer.py", line 56, in __init__
llm-fct-webui-1  | [rank0]:     self.save_model(os.path.join(self.args.output_dir, "pissa_init"))
llm-fct-webui-1  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3340, in save_model
llm-fct-webui-1  | [rank0]:     state_dict = self.accelerator.get_state_dict(self.deepspeed)
llm-fct-webui-1  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 3239, in get_state_dict
llm-fct-webui-1  | [rank0]:     if self.deepspeed_config["zero_optimization"]["stage"] == 3:
llm-fct-webui-1  | [rank0]: AttributeError: 'Accelerator' object has no attribute 'deepspeed_config'

Expected behavior

Successfully started training

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jun 27, 2024
hzhaoy added a commit to hzhaoy/LLaMA-Factory that referenced this issue Jun 27, 2024
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jun 27, 2024
PrimaLuz pushed a commit to PrimaLuz/LLaMA-Factory that referenced this issue Jul 1, 2024
xtchen96 pushed a commit to xtchen96/LLaMA-Factory that referenced this issue Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants