7 results found Sort:

273
2.2k
mit
25
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place.
Created 2023-04-26
12,021 commits to main branch, last one 5 days ago
Evaluate your LLM's response with Prometheus and GPT4 💯
Created 2024-04-18
205 commits to main branch, last one 2 months ago
🤠 Agent-as-a-Judge and DevAI dataset
Created 2024-10-16
20 commits to main branch, last one 4 months ago
[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
Created 2024-05-19
41 commits to main branch, last one 12 days ago
CodeUltraFeedback: aligning large language models to coding preferences
Created 2024-01-25
51 commits to main branch, last one 8 months ago
This is the repo for the survey of Bias and Fairness in IR with LLMs.
Created 2024-03-18
52 commits to main branch, last one 19 hours ago
Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"
Created 2024-06-11
32 commits to main branch, last one 15 days ago