4 results found Sort:

230
3.4k
mit
18
Test your prompts, agents, and RAGs. Use LLM evals to improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with comman...
Created 2023-04-28
1,361 commits to main branch, last one 20 hours ago
159
2.3k
apache-2.0
16
The LLM Evaluation Framework
Created 2023-08-10
3,019 commits to main branch, last one a day ago
4
43
apache-2.0
2
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Created 2023-07-24
925 commits to main branch, last one 6 days ago