Does this project support GRPO? #23

StarDewXXX · 2025-01-30T12:12:41Z

scripts /train_tiny_zero.sh uses PPO for training but Deepseek-R1 uses GRPO. Does this project support GRPO training?

JerryWu-code · 2025-01-30T12:23:25Z

Yeah, this project supports and I've tried that, you could find that in my fork file, where I've also mentioned in my last response in #5 . Or you could find author's script in here and config that, and I suggest you to use mine if you have limited RAM like A100 for that coz I've already fixed hyperparameters and reproduced~

Benjoyo · 2025-01-30T13:01:26Z

Yeah, this project supports and I've tried that, you could find that in my fork file, where I've also mentioned in my last response in #5 . Or you could find author's script in here and config that, and I suggest you to use mine if you have limited RAM like A100 for that coz I've already fixed hyperparameters and reproduced~

Thanks for that. It works well with just 5 rollouts? I read about something like 16 for stable training.

mingxin-zheng · 2025-01-30T15:08:17Z

Hi @JerryWu-code , thanks for sharing! When you mention reproduce, do you see the CoT length increases and the answer becomes more accurate?

beanapologist · 2025-01-30T15:11:38Z

Hi guys, Let me know if I can help. I only use Google Colab though! Thanks! Sarah

…

On Thu, Jan 30, 2025 at 7:08 AM Mingxin Zheng ***@***.***> wrote: Hi @JerryWu-code <https://github.com/JerryWu-code> , thanks for sharing! When you mention reproduce, do you see the CoT length increases and the answer becomes more accurate? — Reply to this email directly, view it on GitHub <#23 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BNAA3RHAU7EUBTP6ABNOPA32NI57XAVCNFSM6AAAAABWFDIYB2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRUG42TONJZG4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

JerryWu-code · 2025-01-30T15:36:41Z

Hi @JerryWu-code , thanks for sharing! When you mention reproduce, do you see the CoT length increases and the answer becomes more accurate?

Sure it exists, you may check my comment and report here: #5 (comment)

JerryWu-code · 2025-01-30T16:26:07Z

Yeah, this project supports and I've tried that, you could find that in my fork file, where I've also mentioned in my last response in #5 . Or you could find author's script in here and config that, and I suggest you to use mine if you have limited RAM like A100 for that coz I've already fixed hyperparameters and reproduced~

Thanks for that. It works well with just 5 rollouts? I read about something like 16 for stable training.

Sure it works well somehow, you may check the result which I've trained using those parameters here: #5 (comment).

mingxin-zheng · 2025-01-31T00:24:13Z

Thanks! I can also reproduce similar result using this set of params @JerryWu-code

jacklanda · 2025-02-12T02:54:03Z

Require the authors update their README for the direct GRPO usage instruction :)

AstonyJ · 2025-02-12T10:25:35Z

Yeah, this project supports and I've tried that, you could find that in my fork file, where I've also mentioned in my last response in #5 . Or you could find author's script in here and config that, and I suggest you to use mine if you have limited RAM like A100 for that coz I've already fixed hyperparameters and reproduced~

I used your code, but why does it only work with 2 GPUs and not with 4 or 8 GPUs? The issue is similar to:#56 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does this project support GRPO? #23

Does this project support GRPO? #23

StarDewXXX commented Jan 30, 2025

JerryWu-code commented Jan 30, 2025

Benjoyo commented Jan 30, 2025

mingxin-zheng commented Jan 30, 2025

beanapologist commented Jan 30, 2025 via email

JerryWu-code commented Jan 30, 2025

JerryWu-code commented Jan 30, 2025

mingxin-zheng commented Jan 31, 2025

jacklanda commented Feb 12, 2025

AstonyJ commented Feb 12, 2025

Does this project support GRPO? #23

Does this project support GRPO? #23

Comments

StarDewXXX commented Jan 30, 2025

JerryWu-code commented Jan 30, 2025

Benjoyo commented Jan 30, 2025

mingxin-zheng commented Jan 30, 2025

beanapologist commented Jan 30, 2025 via email

JerryWu-code commented Jan 30, 2025

JerryWu-code commented Jan 30, 2025

mingxin-zheng commented Jan 31, 2025

jacklanda commented Feb 12, 2025

AstonyJ commented Feb 12, 2025