-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does this project support GRPO? #23
Comments
Yeah, this project supports and I've tried that, you could find that in my fork file, where I've also mentioned in my last response in #5 . Or you could find author's script in here and config that, and I suggest you to use mine if you have limited RAM like A100 for that coz I've already fixed hyperparameters and reproduced~ |
Thanks for that. It works well with just 5 rollouts? I read about something like 16 for stable training. |
Hi @JerryWu-code , thanks for sharing! When you mention |
Hi guys,
Let me know if I can help. I only use Google Colab though!
Thanks!
Sarah
…On Thu, Jan 30, 2025 at 7:08 AM Mingxin Zheng ***@***.***> wrote:
Hi @JerryWu-code <https://github.com/JerryWu-code> , thanks for sharing!
When you mention reproduce, do you see the CoT length increases and the
answer becomes more accurate?
—
Reply to this email directly, view it on GitHub
<#23 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BNAA3RHAU7EUBTP6ABNOPA32NI57XAVCNFSM6AAAAABWFDIYB2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRUG42TONJZG4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Sure it exists, you may check my comment and report here: #5 (comment) |
Sure it works well somehow, you may check the result which I've trained using those parameters here: #5 (comment). |
Thanks! I can also reproduce similar result using this set of params @JerryWu-code |
Require the authors update their README for the direct GRPO usage instruction :) |
I used your code, but why does it only work with 2 GPUs and not with 4 or 8 GPUs? The issue is similar to:#56 (comment) |
scripts /train_tiny_zero.sh uses PPO for training but Deepseek-R1 uses GRPO. Does this project support GRPO training?
The text was updated successfully, but these errors were encountered: