10 results found Sort:
- Filter by Primary Language:
- Python (7)
- Svelte (1)
- TypeScript (1)
- +
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...
Created
2023-04-28
3,899 commits to main branch, last one 14 hours ago
The LLM Evaluation Framework
Created
2023-08-10
4,491 commits to main branch, last one 15 hours ago
Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪
Created
2024-04-11
470 commits to main branch, last one a day ago
The official evaluation suite and dynamic data release for MixEval.
Created
2024-06-01
120 commits to main branch, last one 4 months ago
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
Created
2024-09-20
264 commits to main branch, last one 26 days ago
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Created
2023-07-24
1,092 commits to main branch, last one 25 days ago
MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.
Created
2024-10-30
134 commits to main branch, last one 2 months ago
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
Created
2024-02-23
19 commits to master branch, last one 7 months ago
Develop reliable AI apps
Created
2024-11-25
47 commits to main branch, last one 4 days ago
Benchmarking Large Language Models for FHIR
Created
2024-07-25
1 commits to master branch, last one 3 months ago