14 results found Sort:

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
Created 2022-12-09
128 commits to main branch, last one 9 months ago
211
3.4k
apache-2.0
62
A curated list of reinforcement learning with human feedback resources (continually updated)
Created 2023-02-13
63 commits to main branch, last one 9 days ago
Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.
This repository has been archived (exclude archived)
Created 2022-06-21
88 commits to main branch, last one 8 months ago
Let's build better datasets, together!
Created 2024-03-11
113 commits to main branch, last one 3 months ago
Implementation of Reinforcement Learning from Human Feedback (RLHF)
Created 2022-12-28
76 commits to main branch, last one about a year ago
14
168
mit
7
[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"
Created 2023-11-23
37 commits to main branch, last one 7 months ago
24
167
unknown
2
The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.
Created 2023-03-22
177 commits to master branch, last one about a year ago
25
131
apache-2.0
5
Product analytics for AI Assistants
Created 2022-01-19
939 commits to main branch, last one 5 months ago
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
Created 2023-06-14
3 commits to main branch, last one about a year ago
Dataset Viber is your chill repo for data collection, annotation and vibe checks.
Created 2024-08-07
181 commits to main branch, last one 2 months ago
The Prism Alignment Project
Created 2024-03-06
12 commits to main branch, last one 6 months ago
[ECCV2024] Towards Reliable Advertising Image Generation Using Human Feedback
Created 2024-07-04
41 commits to main branch, last one 2 months ago
[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"
Created 2024-05-19
6 commits to main branch, last one 3 months ago
Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".
Created 2024-04-25
14 commits to main branch, last one 16 days ago