7 results found Sort:

414
5.1k
mit
21
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...
Created 2023-04-28
3,223 commits to main branch, last one 11 hours ago
339
4.2k
apache-2.0
26
The LLM Evaluation Framework
Created 2023-08-10
3,980 commits to main branch, last one 10 hours ago
91
878
apache-2.0
16
Agentic LLM Vulnerability Scanner / AI red teaming kit
Created 2024-04-11
194 commits to main branch, last one 16 hours ago
37
230
unknown
1
The official evaluation suite and dynamic data release for MixEval.
Created 2024-06-01
120 commits to main branch, last one about a month ago
6
75
apache-2.0
2
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Created 2023-07-24
1,067 commits to main branch, last one 3 months ago
MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.
Created 2024-10-30
134 commits to main branch, last one 22 days ago
2
35
unknown
3
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
Created 2024-02-23
19 commits to master branch, last one 5 months ago