PPO全参数训练lr scheduler不起作用，loss先下降后一直增大，reward也在下降 #5060

andylrx · 2024-08-03T03:23:05Z

Reminder

I have read the README and searched the existing issues.

System Info

使用PPO进行全参数训练时设置了cosine lr scheduler，但是打印出来的learning rate一直不变。观察到训练过程中loss在前10步先下降，之后持续增大，检查了sft和reward model没有问题。

Reproduction

### model
model_name_or_path: models/Qwen2-7B-sft-new
ref_model: models/Qwen2-7B-sft-new
reward_model: models/Qwen2-7B-reward-new
reward_model_type: full

### method
stage: ppo
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z2_config.json

### dataset
dataset: ppo_dataset
template: qwen
cutoff_len: 2048
max_samples: 10000000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: models/Qwen2-7B-ppo-new
logging_steps: 1
save_steps: 500000
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
ppo_buffer_size: 4
learning_rate: 1.0e-6
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.01
bf16: true
ddp_timeout: 180000000

### generate
max_new_tokens: 1024
top_k: 0
top_p: 0.9

Expected behavior

No response

Others

No response

andylrx · 2024-08-05T06:45:27Z

求关注下这个问题

github-actions bot added the pending This problem is yet to be addressed label Aug 3, 2024

liu-zichen mentioned this issue Aug 13, 2024

fix lr not change #5163

Merged

2 tasks

hiyouga closed this as completed in #5163 Aug 19, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO全参数训练lr scheduler不起作用，loss先下降后一直增大，reward也在下降 #5060

PPO全参数训练lr scheduler不起作用，loss先下降后一直增大，reward也在下降 #5060

andylrx commented Aug 3, 2024 •

edited

Loading

andylrx commented Aug 5, 2024

PPO全参数训练lr scheduler不起作用，loss先下降后一直增大，reward也在下降 #5060

PPO全参数训练lr scheduler不起作用，loss先下降后一直增大，reward也在下降 #5060

Comments

andylrx commented Aug 3, 2024 • edited Loading

Reminder

System Info

Reproduction

Expected behavior

Others

andylrx commented Aug 5, 2024

andylrx commented Aug 3, 2024 •

edited

Loading