-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about Nectar Datasets #20
Comments
@hendrydong hi hanze, can you look into this? |
Hi, Pairs are randomly sampled for the whole data. Responses with higher scores are assigned as positive label; ones with lower scores are labeled as negative. Beta is fixed as 0.1. |
How many pairs are randomly sampled for one prompt? |
You may have more details from Bo @bpucla . |
Hi.
The paper mentions that the offline vanilla DPO is trained on the Nectar dataset. I have several questions about that.
Thanks for your assistance.
The text was updated successfully, but these errors were encountered: