-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rlhf PPO训练报error #17
Comments
有可能你的代码没拉到最新。并且请给出更详细的配置。 |
代码是今天下午两点拉取的main版本的代码,环境是按照requriment进行安装的 |
这个是common_args.py class TrainArgPath(Enum): class CommonArgs:
|
这是ppo_config.py文件: @DataClass
|
项目中给的train data只是示例,如果数量不够的话就会报错。需要数据需要大于batch*gradient_accumulation_steps。 另外之前的版本忘eval_samples参数了,你可再重新拉去代码,增大data数量再试一下。 |
我在本地试了一下qwen1.5-0.5B PPO full可以正常启动。你的启动命令是什么?yaml文件中的num_processes是否与显卡数量对应?如还无法解决建议用python rlhf_train.py命令启动(可以qlora或lora降低显存),查看具体报错原因。 |
CUDA_VISIBLE_DEVICES=1 nohup accelerate launch --config_file ./ds_config/deepspeed_zero3.yaml rlhf_train.py 启动命令是这个,你那边这两个参数分别设置的是多少eval_samples,per_device_train_batch_size?您数据也是使用的示例数据里的data.jsonl吗 |
先确认一下zero3.yaml文件里的num_processes改没改为1. num_train_epochs:2 |
嗯嗯 zero3.yaml文件里的num_processes设置没问题 |
很奇怪,你用python rlhf_train.py命令启动(可以qlora或lora降低显存),查看具体报错原因。上述报错无法具体定位。 |
使用qwen1.5-chatchat模型进行rlhf ppo训练,遇到如下报错:
The text was updated successfully, but these errors were encountered: