10 results found Sort:

65
557
apache-2.0
6
RewardBench: the first evaluation tool for reward models.
Created 2023-12-23
219 commits to main branch, last one about a month ago
Free and open source code of the https://tournesol.app platform. Meet the community on Discord https://discord.gg/WvcSG55Bf3
Created 2021-03-23
1,654 commits to main branch, last one 5 days ago
Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.
Created 2024-06-01
36 commits to master branch, last one 4 months ago
11
77
isc
6
The MAGICAL benchmark suite for robust imitation learning (NeurIPS 2020)
Created 2019-11-12
144 commits to pyglet1.5 branch, last one 3 years ago
This repository contains the source code for our paper: "NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning". For more details, please refe...
Created 2023-04-20
38 commits to master branch, last one about a month ago
Python-based GUI to collect Feedback of Chemist in Molecules
Created 2024-04-29
31 commits to main branch, last one 7 months ago
3
43
mit
3
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
Created 2024-06-12
7 commits to main branch, last one 8 days ago
Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".
Created 2024-04-25
15 commits to main branch, last one 5 months ago
A Survey of Direct Preference Optimization (DPO)
Created 2024-11-26
52 commits to main branch, last one about a month ago
Official code for ICML 2024 paper, "RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences" (ICML 2024 Spotlight)
Created 2024-04-04
17 commits to main branch, last one 6 months ago