Search Results - RepositoryStats

1 result found Sort:

107

apache-2.0

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

dpo llm rlhf alignment online-rl reasoning llm-aligment distributed-rl dueling-bandits llm-exploration online-alignment thompson-sampling distributed-training

Created 2024-10-15

26 commits to main branch, last one 13 hours ago