10 results found Sort:

513
6.2k
mit
21
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...
Created 2023-04-28
4,212 commits to main branch, last one 19 hours ago
524
6.0k
apache-2.0
26
The LLM Evaluation Framework
Created 2023-08-10
4,774 commits to main branch, last one 23 hours ago
204
1.3k
apache-2.0
17
Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪
Created 2024-04-11
581 commits to main branch, last one 12 days ago
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
Created 2024-09-20
316 commits to main branch, last one 5 days ago
6
76
apache-2.0
1
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Created 2023-07-24
1,092 commits to main branch, last one 2 months ago
MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.
Created 2024-10-30
134 commits to main branch, last one 4 months ago
2
36
unknown
2
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
Created 2024-02-23
19 commits to master branch, last one 9 months ago
Develop reliable AI apps
Created 2024-11-25
62 commits to main branch, last one 4 days ago
Benchmarking Large Language Models for FHIR
Created 2024-07-25
1 commits to master branch, last one 4 months ago