You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have read the README and searched the existing issues.
System Info
I encountered an inconsistency in the way truncation is implemented for DPO in LLAMA-Factory and DPOTrainer in HuggingFace.
In LLAMA-Factory, it seems the cutoff length is only applicable to the chosen response length. The infer_max_len is applied individually for both (prompt + chosen) and (prompt + rejected) responses (checkout pairwise Dataset Implementation here.
Check out the way infer_max_len is used. The definition of infer_max_len is here.
However, to maintain the same prompt for both the chosen and rejected responses, the prompt obtained from cutting off the chosen length ids is added in front of the rejected response. This results in rejected responses that exceed the cutoff limit.
Reproduction
I printed out the maximum response lengths for both chosen and rejected responses and noticed this discrepancy (cutoff is set as 2048, chosen responses adhere to this, rejected responses do not):
The response length of the rejected responses may exceed the cutoff limit, resulting in Out of Memory (OOM) errors in the middle of the runs.
This is inconsistent with how HuggingFace's DPOTrainer is implemented.
In HuggingFace, the DPOTrainer uses the longer of the chosen and rejected responses to decide the length of the prompt and the response that should be cut off. They limit both the chosen and rejected responses to the max_length. Check out the exact implementation here.
Could you also let us know why the cutoff length has been implemented this way? Is this a commonly used method for DPO?
Others
No response
The text was updated successfully, but these errors were encountered:
niravlg
changed the title
Cutoff Length only followed for chosen response in LLAMA-Factory DPO
Cutoff Length only followed for chosen response in Pairwise Data for DPO
Jun 20, 2024
Reminder
System Info
I encountered an inconsistency in the way truncation is implemented for DPO in LLAMA-Factory and DPOTrainer in HuggingFace.
In LLAMA-Factory, it seems the cutoff length is only applicable to the chosen response length. The infer_max_len is applied individually for both (prompt + chosen) and (prompt + rejected) responses (checkout pairwise Dataset Implementation here.
Check out the way infer_max_len is used. The definition of infer_max_len is here.
However, to maintain the same prompt for both the chosen and rejected responses, the prompt obtained from cutting off the chosen length ids is added in front of the rejected response. This results in rejected responses that exceed the cutoff limit.
Reproduction
I printed out the maximum response lengths for both chosen and rejected responses and noticed this discrepancy (cutoff is set as 2048, chosen responses adhere to this, rejected responses do not):
Expected behavior
This has two major issues:
In HuggingFace, the DPOTrainer uses the longer of the chosen and rejected responses to decide the length of the prompt and the response that should be cut off. They limit both the chosen and rejected responses to the max_length. Check out the exact implementation here.
Could you also let us know why the cutoff length has been implemented this way? Is this a commonly used method for DPO?
Others
No response
The text was updated successfully, but these errors were encountered: