14 results found Sort:

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
Created 2022-12-09
128 commits to main branch, last one 11 months ago
216
3.6k
apache-2.0
62
A curated list of reinforcement learning with human feedback resources (continually updated)
Created 2023-02-13
70 commits to main branch, last one 16 days ago
Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.
This repository has been archived (exclude archived)
Created 2022-06-21
88 commits to main branch, last one 10 months ago
Let's build better datasets, together!
Created 2024-03-11
139 commits to main branch, last one a day ago
18
179
mit
8
[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"
Created 2023-11-23
37 commits to main branch, last one 8 months ago
23
172
unknown
3
The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.
Created 2023-03-22
177 commits to master branch, last one about a year ago
Implementation of Reinforcement Learning from Human Feedback (RLHF)
Created 2022-12-28
76 commits to main branch, last one about a year ago
25
137
apache-2.0
6
Product analytics for AI Assistants
Created 2022-01-19
939 commits to main branch, last one 7 months ago
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
Created 2023-06-14
3 commits to main branch, last one about a year ago
The Prism Alignment Project
Created 2024-03-06
12 commits to main branch, last one 8 months ago
Dataset Viber is your chill repo for data collection, annotation and vibe checks.
Created 2024-08-07
181 commits to main branch, last one 3 months ago
[ECCV2024] Towards Reliable Advertising Image Generation Using Human Feedback
Created 2024-07-04
42 commits to main branch, last one about a month ago
[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"
Created 2024-05-19
6 commits to main branch, last one 5 months ago
Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".
Created 2024-04-25
15 commits to main branch, last one 28 days ago