Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

全量微调ChatGLM3训练效果不如官方代码仓的效果,请帮忙分析 #2991

Closed
1 task done
charliedream1 opened this issue Mar 26, 2024 · 9 comments
Closed
1 task done
Labels
solved This problem has been already solved

Comments

@charliedream1
Copy link

charliedream1 commented Mar 26, 2024

Reminder

  • I have read the README and searched the existing issues.

Reproduction

我在使用全量微调时,训练效果总是和chatglm官方代码训出来的有不少差距,我详细对了参数和底层代码,一直没有找到原因。而且同样设置训2轮,官方的代码比l咱们的代码仓训练慢快一倍,但GPU功耗和利用率要高一倍,但是loss的收敛和最终效果要明显好于咱们的。试着更换了好几个版本的transformer也没有改善。

用咱们的代码,即便增加训练轮数,对比chatglm3官方代码,总感觉没有完全学进去,学的很浅。

请帮忙分析一下什么原因导致的,非常感谢。

deepspeed --num_gpus 4 ../../src/train_bash.py
--deepspeed ../deepspeed/ds_z3_config.json
--stage sft
--do_train
--model_name_or_path chatglm3
--dataset train
--dataset_dir ../../data
--template default
--finetuning_type full
--output_dir ../../saves/full/sft
--overwrite_cache
--overwrite_output_dir
--cutoff_len 4096
--preprocessing_num_workers 16
--per_device_train_batch_size 8
--per_device_eval_batch_size 8
--gradient_accumulation_steps 2
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 100
--eval_steps 100
--evaluation_strategy steps
--learning_rate 5e-5
--num_train_epochs 2.0
--val_size 0.1
--ddp_timeout 1800000
--plot_loss
--fp16

Expected behavior

No response

System Info

No response

Others

No response

@charliedream1 charliedream1 changed the title 全量微调ChatGLM3训练效果不如官方代码仓的效果 全量微调ChatGLM3训练效果不如官方代码仓的效果,请帮忙分析 Mar 26, 2024
@hiyouga
Copy link
Owner

hiyouga commented Mar 26, 2024

你 template 都没选对,应该使用 chatglm3

@hiyouga hiyouga added the solved This problem has been already solved label Mar 26, 2024
@hiyouga hiyouga closed this as completed Mar 26, 2024
@charliedream1
Copy link
Author

我选的是chatglm3,issue里是我粘贴错误,因为我的启动脚本里有很多路径和变量名,我就从咱们的代码仓里拷的

@hiyouga
Copy link
Owner

hiyouga commented Mar 26, 2024

建议检查一下两者环境和配置是否一样

@charliedream1
Copy link
Author

charliedream1 commented Mar 26, 2024 via email

@hiyouga
Copy link
Owner

hiyouga commented Mar 26, 2024

你试一下使用这个版本的代码呢?
https://github.com/hiyouga/LLaMA-Factory/tree/v0.5.3

@charliedream1
Copy link
Author

charliedream1 commented Mar 26, 2024 via email

@hiyouga hiyouga reopened this Mar 26, 2024
@hiyouga
Copy link
Owner

hiyouga commented Mar 26, 2024

我们好像定位到错误原因了,是最近的更新中引入了 bug

@hiyouga
Copy link
Owner

hiyouga commented Mar 26, 2024

请更新至最新版本 3bcd41b 并且重试

@hiyouga hiyouga closed this as completed Mar 26, 2024
@charliedream1
Copy link
Author

charliedream1 commented Mar 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants