10 results found Sort:
- Filter by Primary Language:
- Python (7)
- Jupyter Notebook (2)
- TypeScript (1)
- +
🐢 Open-Source Evaluation & Testing for LLMs and ML models
Created
2022-03-06
9,731 commits to main branch, last one a day ago
Test your prompts, agents, and RAGs. Use LLM evals to improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with comman...
Created
2023-04-28
1,361 commits to main branch, last one 20 hours ago
AI Observability & Evaluation
Created
2022-11-09
2,531 commits to main branch, last one 13 hours ago
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...
Created
2022-11-07
764 commits to main branch, last one 18 days ago
Python SDK for running evaluations on LLM generated responses
Created
2023-11-22
455 commits to main branch, last one a day ago
Generate ideal question-answers for testing RAG
Created
2023-07-04
45 commits to master branch, last one a day ago
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Created
2023-11-19
40 commits to main branch, last one 4 months ago
Бенчмарк сравнивает русские аналоги ChatGPT: Saiga, YandexGPT, Gigachat
Created
2023-08-23
112 commits to master branch, last one 9 months ago
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Created
2023-07-24
925 commits to main branch, last one 6 days ago
🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.
Created
2024-02-17
278 commits to main branch, last one about a month ago