1 result found Sort:

Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)
Created 2023-10-08
11 commits to main branch, last one 8 months ago