Search Results - RepositoryStats

2 results found Sort:

329

apache-2.0

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

dpo llm ppo grpo rlhf r1-zero alignment online-rl reasoning llm-aligment distributed-rl dueling-bandits llm-exploration online-alignment thompson-sampling distributed-training

Created 2024-10-15

36 commits to main branch, last one 5 days ago

unknown

implementation of distributed reinforcement learning with distributed tensorflow

apex r2d2 impala tensorflow distributed-rl distributed-tensorflow reinforcement-learning scalable-reinforcement-learning distributed-reinforcement-learning

Created 2020-04-07

44 commits to master branch, last one 4 years ago