DPO 训练时，prompt 与 answer 拼接问题，导致cutoff_length这一超参数无法对数据进行有效截断。 #4617

THZdyjy · 2024-06-29T03:28:38Z

如上图所示，在源码中，在拼接 prompt 和 rejected 时，这里的 prompt 采用的是 chosen_prompt, 而不是 rejected_prompt
,这导致当设置了 cutoff_length=2048时，不能对 rejected 数据进行有效截断。
将代码修改后，如下图所示，能够根据cutoff_length对数据进行有效截断。

niravlg · 2024-06-30T06:42:23Z

I believe this issue has been mentioned in
#4402

As far as I understand, the above suggested solution changes the prompt used for chosen and rejected responses in DPO which likely effect the training and results. Instead, I believe the implementation should follow from the DPO Trainer's implementation in -
https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py

PS - The above issue in not mentioned in my native language. I used ChatGPT to translate it to english. I apologize in advance of any confusion.

Deprecate reserved_label_len arg

hiyouga · 2024-06-30T17:26:44Z

fixed

Deprecate reserved_label_len arg

github-actions bot added the pending This problem is yet to be addressed label Jun 29, 2024

hiyouga added a commit that referenced this issue Jun 30, 2024

fix #4402 #4617

1771251

Deprecate reserved_label_len arg

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jun 30, 2024

hiyouga closed this as completed Jun 30, 2024

PrimaLuz pushed a commit to PrimaLuz/LLaMA-Factory that referenced this issue Jul 1, 2024

fix hiyouga#4402 hiyouga#4617

2237fdf

Deprecate reserved_label_len arg

xtchen96 pushed a commit to xtchen96/LLaMA-Factory that referenced this issue Jul 17, 2024

fix hiyouga#4402 hiyouga#4617

d1ea1d5

Deprecate reserved_label_len arg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPO 训练时，prompt 与 answer 拼接问题，导致cutoff_length这一超参数无法对数据进行有效截断。 #4617

DPO 训练时，prompt 与 answer 拼接问题，导致cutoff_length这一超参数无法对数据进行有效截断。 #4617

THZdyjy commented Jun 29, 2024

niravlg commented Jun 30, 2024

hiyouga commented Jun 30, 2024

DPO 训练时，prompt 与 answer 拼接问题，导致cutoff_length这一超参数无法对数据进行有效截断。 #4617

DPO 训练时，prompt 与 answer 拼接问题，导致cutoff_length这一超参数无法对数据进行有效截断。 #4617

Comments

THZdyjy commented Jun 29, 2024

niravlg commented Jun 30, 2024

hiyouga commented Jun 30, 2024