Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does this project support GRPO? #23

Open
StarDewXXX opened this issue Jan 30, 2025 · 9 comments
Open

Does this project support GRPO? #23

StarDewXXX opened this issue Jan 30, 2025 · 9 comments

Comments

@StarDewXXX
Copy link

scripts /train_tiny_zero.sh uses PPO for training but Deepseek-R1 uses GRPO. Does this project support GRPO training?

@JerryWu-code
Copy link

Yeah, this project supports and I've tried that, you could find that in my fork file, where I've also mentioned in my last response in #5 . Or you could find author's script in here and config that, and I suggest you to use mine if you have limited RAM like A100 for that coz I've already fixed hyperparameters and reproduced~

@Benjoyo
Copy link

Benjoyo commented Jan 30, 2025

Yeah, this project supports and I've tried that, you could find that in my fork file, where I've also mentioned in my last response in #5 . Or you could find author's script in here and config that, and I suggest you to use mine if you have limited RAM like A100 for that coz I've already fixed hyperparameters and reproduced~

Thanks for that. It works well with just 5 rollouts? I read about something like 16 for stable training.

@mingxin-zheng
Copy link

Hi @JerryWu-code , thanks for sharing! When you mention reproduce, do you see the CoT length increases and the answer becomes more accurate?

@beanapologist
Copy link

beanapologist commented Jan 30, 2025 via email

@JerryWu-code
Copy link

Hi @JerryWu-code , thanks for sharing! When you mention reproduce, do you see the CoT length increases and the answer becomes more accurate?

Sure it exists, you may check my comment and report here: #5 (comment)

@JerryWu-code
Copy link

Yeah, this project supports and I've tried that, you could find that in my fork file, where I've also mentioned in my last response in #5 . Or you could find author's script in here and config that, and I suggest you to use mine if you have limited RAM like A100 for that coz I've already fixed hyperparameters and reproduced~

Thanks for that. It works well with just 5 rollouts? I read about something like 16 for stable training.

Sure it works well somehow, you may check the result which I've trained using those parameters here: #5 (comment).

@mingxin-zheng
Copy link

Thanks! I can also reproduce similar result using this set of params @JerryWu-code

@jacklanda
Copy link

Require the authors update their README for the direct GRPO usage instruction :)

@AstonyJ
Copy link

AstonyJ commented Feb 12, 2025

Yeah, this project supports and I've tried that, you could find that in my fork file, where I've also mentioned in my last response in #5 . Or you could find author's script in here and config that, and I suggest you to use mine if you have limited RAM like A100 for that coz I've already fixed hyperparameters and reproduced~

I used your code, but why does it only work with 2 GPUs and not with 4 or 8 GPUs? The issue is similar to:#56 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants