9 results found Sort:
- Filter by Primary Language:
- Python (7)
- Jupyter Notebook (1)
- +
Recipes to train reward model for RLHF.
Created
2024-03-21
128 commits to main branch, last one 9 days ago
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT...
Created
2023-04-22
21 commits to main branch, last one about a year ago
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatG...
Created
2023-04-18
17 commits to main branch, last one about a year ago
[NeurIPS 2024] ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
Created
2024-05-27
11 commits to main branch, last one 2 months ago
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT...
Created
2023-04-18
25 commits to main branch, last one about a year ago
Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"
Created
2024-06-11
29 commits to main branch, last one about a month ago
GenRM-CoT: Data release for verification rationales
Created
2024-10-16
36 commits to main branch, last one 2 months ago
ZYN: Zero-Shot Reward Models with Yes-No Questions
Created
2023-03-03
21 commits to main branch, last one about a year ago
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
Created
2023-12-02
2 commits to main branch, last one 9 months ago