Search Results - RepositoryStats

288

2.4k

mit

25

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

llm-tools llm-platform llm-framework llm-as-a-judge llm-evaluation llm-monitoring llm-playground rag-evaluation llmops-platform llm-observability prompt-management prompt-engineering

Created 2023-04-26

12,067 commits to main branch, last one 2 days ago

prometheus-eval prometheus-eval

55

901

apache-2.0

3

Evaluate your LLM's response with Prometheus and GPT4 💯

llm gpt4 vllm llmops python litellm evaluation llm-as-a-judge llm-as-evaluator

Created 2024-04-18

209 commits to main branch, last one 26 days ago

agent-as-a-judge metauto-ai

57

398

mit

3

🤠 Agent-as-a-Judge and DevAI dataset

llms llm-as-a-judge agent-as-a-judge

Created 2024-10-16

20 commits to main branch, last one 5 months ago

xFinder IAAR-Shanghai

7

163

other

4

[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

Created 2024-05-19

41 commits to main branch, last one about a month ago

CodeUltraFeedback martin-wey

5

71

mit

2

CodeUltraFeedback: aligning large language models to coding preferences

dpo alignment codal-bench llm-as-a-judge code-generation codeultrafeedback large-language-models

Created 2024-01-25

51 commits to main branch, last one 9 months ago

LLM-IR-Bias-Fairness-Survey KID-22

3

52

mit

3

This is the repo for the survey of Bias and Fairness in IR with LLMs.

ir llm bias llm4ir llm4rs chatgpt llm4rec fairness llm-as-a-judge llm-as-evaluator recommender-systems information-retrieval large-language-models

Created 2024-03-18

55 commits to main branch, last one 8 days ago

MJ-Bench MJ-Bench

5

43

mit

1

Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"

reward-models llm-as-a-judge llm-benchmarking multimodal-judge multimodal-foundation-model

Created 2024-06-11

32 commits to main branch, last one about a month ago