Hyper-parameter tuning for other models #69

Tejaswgupta · 2024-09-28T07:17:31Z

Kudos to the great work so far. I saw the hyper-parameters for the mainstream models but is there any resource to find the optimal hyper-parameters for models like Qwen extensive without hit and trial?

Thanks in advance.

yumeng5 · 2024-10-13T00:11:12Z

Hi @Tejaswgupta

We found that hyperparameter tuning is quite necessary for all models (Llama, Gemma, Mistral, etc.) and all methods (DPO, SimPO, etc.) we experimented with, so I don't think one can directly obtain the optimal hyperparameters without tuning for new models.

That said, we provided a guide to help tune hyperparameters more efficiently -- you can tune the learning rate first while keeping other hyperparameters (beta and gamma) fixed as their default value, and then tune gamma a bit (by tuning the gamma_beta_ratio while keeping beta fixed). This could hopefully reduce the number of trials required.

I hope this is helpful!

Best,
Yu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyper-parameter tuning for other models #69

Hyper-parameter tuning for other models #69

Tejaswgupta commented Sep 28, 2024

yumeng5 commented Oct 13, 2024

Hyper-parameter tuning for other models #69

Hyper-parameter tuning for other models #69

Comments

Tejaswgupta commented Sep 28, 2024

yumeng5 commented Oct 13, 2024