4 results found Sort:

154
1.7k
apache-2.0
22
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
Created 2023-07-30
688 commits to main branch, last one 24 hours ago
107
1.2k
apache-2.0
17
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Created 2023-05-15
111 commits to main branch, last one 16 days ago
58
737
apache-2.0
8
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
Created 2023-05-03
72 commits to main branch, last one 4 months ago
2
82
apache-2.0
5
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
Created 2023-07-28
35 commits to main branch, last one 10 months ago