7 results found Sort:
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)
Created
2023-07-30
979 commits to main branch, last one 2 days ago
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Created
2023-05-15
111 commits to main branch, last one 5 months ago
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
Created
2023-05-03
72 commits to main branch, last one 9 months ago
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
Created
2024-06-18
1,073 commits to main branch, last one 6 days ago
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
Created
2023-07-28
35 commits to main branch, last one about a year ago
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
Created
2023-12-02
2 commits to main branch, last one 8 months ago
Official code for ICML 2024 paper, "RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences" (ICML 2024 Spotlight)
Created
2024-04-04
17 commits to main branch, last one about a month ago