-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reward model训练的重载以及评测 #4743
Comments
仅支持用 llamafactory 加载 RM |
怎么对rm做评测呢? |
|
把训练脚本里的 do_train 改成 do_eval |
我知道了,修改yaml 奖励预测结果会在output_dir里 |
@hiyouga @xd2333 命令: llamafactory-cli train /root/autodl-tmp/llm_prj/AdGen/config/reward_infer_model.yaml modelmodel_name_or_path: /root/autodl-tmp/llm_prj/AdGen/reward_model/merge methodstage: rm do_train: false datasetdataset: ad_dpo outputoutput_dir: /root/autodl-tmp/llm_prj/AdGen/reward_model/infer trainper_device_train_batch_size: 1 evalval_size: 0.1 dataset: ad_dpo配置如下: |
设置eval_dataset: ad_dpo,删除val_size: 0.1、eval_strategy: steps、eval_steps: 500 |
请问请求多模态的标准的请求脚本可以提供一个 demo case么~ 试了半天不知道怎么拼接 message。。。 或者使用 trl 库的话该怎么加载模型推理得到分数~ 感谢感谢~ |
请问一下,训练reward model支持这样的数据格式吗 openbookqa,一个prompt+多个response,跟Instructgpt一样 |
Reminder
System Info
Reproduction
导出:lamafactory-cli export --model_name_or_path=“./save” --stage=rm --export_dir="./see12" --template=default
测试:
报错:
v_head weight is found. This IS expected if you are not resuming PPO training
#4379 (comment)
Expected behavior
No response
Others
No response
The text was updated successfully, but these errors were encountered: