5 results found Sort:

379
4.8k
mit
21
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...
Created 2023-04-28
2,842 commits to main branch, last one 8 hours ago
298
3.7k
apache-2.0
21
The LLM Evaluation Framework
Created 2023-08-10
3,808 commits to main branch, last one a day ago
6
74
apache-2.0
2
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Created 2023-07-24
1,067 commits to main branch, last one 2 months ago
2
32
unknown
3
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
Created 2024-02-23
19 commits to master branch, last one 4 months ago