10 results found Sort:

228
3.7k
apache-2.0
29
🐢 Open-Source Evaluation & Testing for LLMs and ML models
Created 2022-03-06
9,731 commits to main branch, last one a day ago
230
3.4k
mit
18
Test your prompts, agents, and RAGs. Use LLM evals to improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with comman...
Created 2023-04-28
1,361 commits to main branch, last one 20 hours ago
211
3.0k
other
27
AI Observability & Evaluation
Created 2022-11-09
2,531 commits to main branch, last one 13 hours ago
174
2.1k
apache-2.0
20
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...
Created 2022-11-07
764 commits to main branch, last one 18 days ago
Python SDK for running evaluations on LLM generated responses
Created 2023-11-22
455 commits to main branch, last one a day ago
Generate ideal question-answers for testing RAG
Created 2023-07-04
45 commits to master branch, last one a day ago
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Created 2023-11-19
40 commits to main branch, last one 4 months ago
2
54
unknown
3
Бенчмарк сравнивает русские аналоги ChatGPT: Saiga, YandexGPT, Gigachat
Created 2023-08-23
112 commits to master branch, last one 9 months ago
4
43
apache-2.0
2
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Created 2023-07-24
925 commits to main branch, last one 6 days ago
🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.
Created 2024-02-17
278 commits to main branch, last one about a month ago