-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在transformers==4.31下,CastOutputToFloat的置換產生的bugs #268
Comments
Fixed |
感謝幫忙進版,但目前這個版本在訓練時loss不會下降 |
我这边验证是可以正常下降的,请排查模型文件或者超参数原因 |
請問是使用transformers==4.31(這個版本是官方認證能正確執行Llama 2的版本)嗎?我在這個版本下訓練loss無法下降。 但是我試transformers==4.29.1,loss是可以下降的。 我的猜想是
並不能真正解決
所帶來的問題 |
我的环境是 4.30.0 |
4.31 是 Llama 2 官方指定的版本,他們有針對 Llama 2 去作修正。 過去的版本,在 inference 時會出現奇怪的行為 |
如果将 CastOutputToFloat 删掉可以正常训练吗? |
我使用transformers==4.31並且把CastOutputToFloat刪除,training loss 不會正常的下降 但我有試了transformers==4.29.1 + DEL CastOutputToFloat,training loss 此時會正常的下降 |
huggingface/transformers@07360b6 |
参考一下这个:#202 |
Update libraries using
And it works. Thank you. |
In this case, the training loss can decrease, but I found we can not save the checkpoint. |
@GitYCC 更新代码后重试 |
It still cannot save the checkpoint. |
把依赖库降级到稳定版本而非 dev 版以后再试一下 |
@hiyouga |
Please provide a script for reproducing the error. |
accelerate_config.yaml:
python version: 3.10 Tesla V100-SXM2-32GB |
@hiyouga
|
@hiyouga But training loss is still stuck now. |
Consider using English datasets to fine-tune LLaMA-2 models instead of non-English corpus. |
@hiyouga It works, but please help to check whether the method is right or not. version I used:
use 4 GPU to avoid out of RAM memory |
支援LLaMA 2的transformers版本為4.31。
其中增加了這一行:
https://github.com/huggingface/transformers/blame/e42587f596181396e1c4b63660abf0c736b10dae/src/transformers/models/llama/modeling_llama.py#L820
其中
self.lm_head.weight
在執行時會出現問題,因為在https://github.com/hiyouga/LLaMA-Efficient-Tuning/blob/553b97a9d59a9fe69df8c4014db4dbb121fbf461/src/llmtuner/extras/misc.py#L95
lm_head被CastOutputToFloat置換,因此不會有
weight
這個attribute,請問應該怎麼解決?The text was updated successfully, but these errors were encountered: