2 results found Sort:
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
Created
2023-12-02
2 commits to main branch, last one 8 months ago
[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection
Created
2024-10-17
5 commits to main branch, last one 23 days ago