6 results found Sort:

Recipes to train reward model for RLHF.
Created 2024-03-21
61 commits to main branch, last one 12 days ago
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT...
Created 2023-04-22
21 commits to main branch, last one about a year ago
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatG...
Created 2023-04-18
17 commits to main branch, last one about a year ago
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT...
Created 2023-04-18
25 commits to main branch, last one about a year ago
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
Created 2024-05-27
5 commits to main branch, last one 13 days ago
ZYN: Zero-Shot Reward Models with Yes-No Questions
Created 2023-03-03
21 commits to main branch, last one 10 months ago