4 results found Sort:

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
Created 2024-05-21
78 commits to main branch, last one about a month ago
17
189
unknown
4
[Paper][ACL 2024 Findings] Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering
Created 2023-11-09
14 commits to main branch, last one 6 months ago
0
36
unknown
2
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
Created 2024-05-22
9 commits to main branch, last one about a month ago
Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).
Created 2024-02-07
4 commits to main branch, last one 7 months ago