Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO全参数训练lr scheduler不起作用,loss先下降后一直增大,reward也在下降 #5060

Closed
1 task done
andylrx opened this issue Aug 3, 2024 · 1 comment · Fixed by #5163
Closed
1 task done
Labels
solved This problem has been already solved

Comments

@andylrx
Copy link

andylrx commented Aug 3, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

使用PPO进行全参数训练时设置了cosine lr scheduler,但是打印出来的learning rate一直不变。观察到训练过程中loss在前10步先下降,之后持续增大,检查了sft和reward model没有问题。
WechatIMG19

Reproduction

### model
model_name_or_path: models/Qwen2-7B-sft-new
ref_model: models/Qwen2-7B-sft-new
reward_model: models/Qwen2-7B-reward-new
reward_model_type: full

### method
stage: ppo
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z2_config.json

### dataset
dataset: ppo_dataset
template: qwen
cutoff_len: 2048
max_samples: 10000000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: models/Qwen2-7B-ppo-new
logging_steps: 1
save_steps: 500000
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
ppo_buffer_size: 4
learning_rate: 1.0e-6
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.01
bf16: true
ddp_timeout: 180000000

### generate
max_new_tokens: 1024
top_k: 0
top_p: 0.9

Expected behavior

No response

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Aug 3, 2024
@andylrx
Copy link
Author

andylrx commented Aug 5, 2024

求关注下这个问题

@liu-zichen liu-zichen mentioned this issue Aug 13, 2024
2 tasks
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants