Search Results - RepositoryStats

13

232

apache-2.0

5

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

dpo llm ppo grpo rlhf r1-zero alignment online-rl reasoning llm-aligment distributed-rl dueling-bandits llm-exploration online-alignment thompson-sampling distributed-training

Created 2024-10-15

31 commits to main branch, last one 2 days ago

26

132

apache-2.0

3

:bust_in_silhouette: Multi-Armed Bandit Algorithms Library (MAB) :cop:

arm mab ucb rank reward algorithm ranked-mab simulation monte-carlo ranking-algorithm thompson-sampling contextual-bandits multi-armed-bandit montecarlo-simulation reinforcement-learning reinforcement-learning-algorithms

Created 2019-01-24

69 commits to master branch, last one 2 years ago

7

54

apache-2.0

7

Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy.

go golang thompson data-science experimentation thompson-sampling multi-armed-bandit multiarmed-bandits multi-armed-bandits reinforcement-learning

Created 2021-02-18

20 commits to main branch, last one 4 years ago