Search Results - RepositoryStats

179

2.5k

apache-2.0

22

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

ai llms rlhf rlaif openai python huggingface synthetic-data synthetic-dataset-generation

Created 2023-10-16

840 commits to main branch, last one about a month ago

awesome-RLAIF mengdi-li

4

157

apache-2.0

6

A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)

rl llms rlhf rlaif alignment

Created 2023-09-17

41 commits to main branch, last one about a month ago

VideoDPO CIntellifusion

0

53

unknown

1

Official Implementation of VideoDPO

aigc rlhf rlaif generative-ai videogeneration diffusion-models self-improvement

Created 2024-12-19

91 commits to main branch, last one about a month ago

Prompt-OIRL holarissun

6

37

mit

2

code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning

irl llm rlhf rlaif offline-rl offline-irl prompt-engineering large-language-models inverse-reinforcement-learning

Created 2023-09-10

58 commits to main branch, last one 11 months ago

zero-shot-reward-models vicgalle

8

33

mit

2

ZYN: Zero-Shot Reward Models with Yes-No Questions

llm rlhf trlx rlaif zero-shot reward-models reinforcement-learning

Created 2023-03-03

21 commits to main branch, last one about a year ago

openpo dannylee1020

0

27

apache-2.0

2

This repository has no description...

ai dpo llm rlhf rlaif python evaluation finetuning ai-feedback huggingface llm-evaluation synthetic-data synthetic-data-generation

Created 2024-10-28

260 commits to master branch, last one 2 months ago