3 results found Sort:
🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
Created
2024-10-15
28 commits to main branch, last one 11 days ago
:bust_in_silhouette: Multi-Armed Bandit Algorithms Library (MAB) :cop:
Created
2019-01-24
69 commits to master branch, last one 2 years ago
Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy.
Created
2021-02-18
20 commits to main branch, last one 3 years ago