4 results found Sort:
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
Created
2023-07-30
688 commits to main branch, last one 24 hours ago
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Created
2023-05-15
111 commits to main branch, last one 16 days ago
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
Created
2023-05-03
72 commits to main branch, last one 4 months ago
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
Created
2023-07-28
35 commits to main branch, last one 10 months ago