Search Results - RepositoryStats

RLHF-Reward-Modeling RLHFlow

84

1.2k

apache-2.0

19

Recipes to train reward model for RLHF.

llm rlhf llama3 reward-models

Created 2024-03-21

135 commits to main branch, last one 15 days ago

Vicuna-LoRA-RLHF-PyTorch jackaduma

18

211

mit

7

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT...

gpt llm ppo lora peft rlhf llama vicuna chatgpt pytorch finetune vicuna-7b reward-models

Created 2023-04-22

21 commits to main branch, last one about a year ago

ChatGLM-LoRA-RLHF-PyTorch jackaduma

10

134

mit

6

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatG...

gpt llm ppo lora peft rlhf llama chatglm chatgpt pytorch finetune deepspeed chatglm-6b reward-models

Created 2023-04-18

17 commits to main branch, last one about a year ago

ReNO ExplainableML

10

123

mit

5

[NeurIPS 2024] ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

reward-models text-to-image stable-diffusion text-to-image-generation

Created 2024-05-27

16 commits to main branch, last one 27 days ago

Alpaca-LoRA-RLHF-PyTorch jackaduma

6

58

mit

4

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT...