-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DPO 训练时,prompt 与 answer 拼接问题,导致cutoff_length这一超参数无法对数据进行有效截断。 #4617
Comments
I believe this issue has been mentioned in As far as I understand, the above suggested solution changes the prompt used for chosen and rejected responses in DPO which likely effect the training and results. Instead, I believe the implementation should follow from the DPO Trainer's implementation in - PS - The above issue in not mentioned in my native language. I used ChatGPT to translate it to english. I apologize in advance of any confusion. |
fixed |
Deprecate reserved_label_len arg
Deprecate reserved_label_len arg
如上图所示,在源码中,在拼接 prompt 和 rejected 时,这里的 prompt 采用的是 chosen_prompt, 而不是 rejected_prompt
,这导致当设置了 cutoff_length=2048时,不能对 rejected 数据进行有效截断。
将代码修改后,如下图所示,能够根据cutoff_length对数据进行有效截断。
The text was updated successfully, but these errors were encountered: