Search Results - RepositoryStats

615

6.3k

apache-2.0

38

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

vllm raylib openai-o1 transformers large-language-models reinforcement-learning proximal-policy-optimization reinforcement-learning-from-human-feedback

Created 2023-07-30

1,219 commits to main branch, last one 17 hours ago

safe-rlhf PKU-Alignment

119

1.4k

apache-2.0

16

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Created 2023-05-15

111 commits to main branch, last one 10 months ago

alpaca_farm tatsu-lab

62

805

apache-2.0

10

A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.

deep-learning instruction-following large-language-models natural-language-processing reinforcement-learning-from-human-feedback

Created 2023-05-03

72 commits to main branch, last one about a year ago

ReaLHF openpsi-project

17

277

apache-2.0

5

Super-Efficient RLHF Training of LLMs with Parameter Reallocation

llm deepspeed megatron-lm llm-training transformers llm-framework distributed-systems distributed-computing large-language-models reinforcement-learning large-scale-machine-learning reinforcement-learning-from-human-feedback

Created 2024-06-18

1,078 commits to main branch, last one 3 months ago

Okapi nlp-uoregon

2

94

apache-2.0

5

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

nlp rlhf bloom llama chatbot dataset multilingual language-model instruction-tuning question-answering large-language-models reinforcement-learning natural-language-processing reinforcement-learning-from-human-feedback

Created 2023-07-28

35 commits to main branch, last one about a year ago

llm_optimization tlc4418

4

42

mit

2

A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.

best-of-n ensembles deep-learning reward-models large-language-models reinforcement-learning-from-human-feedback

Created 2023-12-02

6 commits to main branch, last one 2 months ago

awesome-direct-preference-optimization liushunyu

0

33

unknown

1

A Survey of Direct Preference Optimization (DPO)

dpo llm llms review survey alignment preference-learning large-language-model large-language-models direct-preference-optimization reinforcement-learning-from-human-feedback

Created 2024-11-26

52 commits to main branch, last one 28 days ago

RIME_ICML2024 CJReinforce

2

28

mit

2

Official code for ICML 2024 paper, "RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences" (ICML 2024 Spotlight)

robotics locomotion manipulation deep-learning preference-learning reinforcement-learning artificial-intelligence reinforcement-learning-from-human-feedback

Created 2024-04-04

17 commits to main branch, last one 6 months ago