Skip to content

Issues: huggingface/trl

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

GKD trainer doesn't work too well with the llama series 🐛 bug Something isn't working 🏋 GKD Related to GKD
#2586 opened Jan 17, 2025 by Omar-Deepshard
5 tasks done
Make PPOTrainer compatible with PRMs
#2577 opened Jan 16, 2025 by kyleliang919
ORPO on SFT dataset 🏋 ORPO Related to ORPO ❓ question Seeking clarification or more information
#2570 opened Jan 15, 2025 by vitalyshalumov
7 of 9 tasks
RuntimeError: Function 'Log1PBackward0' returned nan values in its 0th output. 🐛 bug Something isn't working 🏋 ORPO Related to ORPO
#2564 opened Jan 13, 2025 by zhaoxjmail
7 of 9 tasks
dpo_vlm.py 🐛 bug Something isn't working 🏋 DPO Related to DPO 👁️ VLM Related to Visual Language Models
#2563 opened Jan 12, 2025 by liuchaohu
5 of 9 tasks
Problem with accelerate>=1.0.0 when running official PPO/RLOO examples ⚡accelerate Related to accelerate 🏋 PPO Related to PPO 🏋 RLOO Related to RLOO
#2555 opened Jan 10, 2025 by dawidm
7 of 9 tasks
KTOTrainer should work when actual batch size==1 ✨ enhancement New feature or request 🏋 KTO Related to KTO
#2554 opened Jan 10, 2025 by starmpcc
DPO loss constant, logits chosen/rejected identical, and rewards nan 🐛 bug Something isn't working 🏋 DPO Related to DPO
#2553 opened Jan 9, 2025 by solume
7 of 9 tasks
Finetuning on the last turn of multi-turn conversations ❓ question Seeking clarification or more information 🏋 SFT Related to SFT
#2545 opened Jan 6, 2025 by okhat
Is truncation_mode used in DPOTrainer? 🏋 DPO Related to DPO ❓ question Seeking clarification or more information
#2538 opened Jan 2, 2025 by anakin87
Different finetune speed in DPO task of peft and ms-swift (600/S iter vs 30/s iter) 🏋 DPO Related to DPO 🙋 help from community wanted Open invitation for community members to contribute ⚡ PEFT Related to PEFT
#2536 opened Jan 2, 2025 by maoulee
7 of 9 tasks
(Willing to PR) Will it be welcomed if speeding up algorithms like PPO and code refactor/cleanup? 🏋 PPO Related to PPO ❓ question Seeking clarification or more information 🏋 RLOO Related to RLOO
#2535 opened Dec 31, 2024 by fzyzcjy
Using "beam search" strategy while generating the responses 🙋 help from community wanted Open invitation for community members to contribute 🏋 PPO Related to PPO
#2534 opened Dec 31, 2024 by SachinVashisth
onlinedpo error when use deepspeed zero3 🐛 bug Something isn't working 🚀 deepspeed Related to deepspeed ⏳ needs more info Additional information or clarification is required to proceed 🏋 Online DPO Related to Online DPO
#2532 opened Dec 30, 2024 by yiyepiaoling0715
5 of 9 tasks
PPOTrainer: num_mini_batches setting affects training progress bar in an unexpected way 🐛 bug Something isn't working 🏋 PPO Related to PPO
#2530 opened Dec 29, 2024 by dawidm
6 of 9 tasks
Option to disable unwrapping model for generation in PPO/RLOO/OnlineDPO ✨ enhancement New feature or request 🏋 Online DPO Related to Online DPO 🏋 PPO Related to PPO 🏋 RLOO Related to RLOO
#2529 opened Dec 28, 2024 by dawidm
Direct Q-Function Optimization ✨ enhancement New feature or request
#2526 opened Dec 28, 2024 by catherinelee274
Integrate OREO into TRL and HF ✨ enhancement New feature or request
#2525 opened Dec 28, 2024 by August-murr
3 tasks done
ProTip! Updated in the last three days: updated:>2025-01-14.