Search Results - RepositoryStats

513

6.2k

mit

21

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...

ci llm rag cicd ci-cd llmops prompts testing llm-eval evaluation pentesting red-teaming llm-evaluation prompt-testing prompt-engineering evaluation-framework vulnerability-scanners llm-evaluation-framework

Created 2023-04-28

4,212 commits to main branch, last one 19 hours ago

deepeval confident-ai

524

6.0k

apache-2.0

26

The LLM Evaluation Framework

llm-evaluation evaluation-metrics evaluation-framework llm-evaluation-metrics llm-evaluation-framework

Created 2023-08-10

4,774 commits to main branch, last one 23 hours ago

agentic_security msoedov

204

1.3k

apache-2.0

17

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

llm-fuzzer ai-red-team llm-fuzzing llm-scanner llm-security agent-security llm-evaluation llm-guardrails llm-jailbreaks prompt-testing agent-framework llm-vulnerabilities llm-fuzzer-aggregator llm-evaluation-framework

Created 2024-04-11

581 commits to main branch, last one 12 days ago

MixEval JinjieNi

41

235

unknown

1

The official evaluation suite and dynamic data release for MixEval.

mixeval benchmark evaluation llm-inference llm-evaluation benchmark-mixture foundation-models benchmarking-suite evaluation-framework large-language-model large-language-models benchmarking-framework large-multimodal-models llm-evaluation-framework

Created 2024-06-01

120 commits to main branch, last one 5 months ago

langfair cvs-health

32

201

other

6

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

ai llm bias python fairness ai-safety ethical-ai fairness-ai fairness-ml bias-detection llm-evaluation responsible-ai fairness-testing large-language-models llm-evaluation-metrics artificial-intelligence llm-evaluation-framework

Created 2024-09-20

316 commits to main branch, last one 5 days ago

parea-sdk-py parea-ai

6

76

apache-2.0

1

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

llm llmops metrics llm-eval llm-tools generative-ai llm-evaluation good-first-issue llms-benchmarking prompt-engineering llm-evaluation-toolkit llm-evaluation-framework

Created 2023-07-24

1,092 commits to main branch, last one 2 months ago

contextcheck Addepto

8

64

mit

2

MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.

ci llm rag ai-chat ai-testing llm-testing open-source prompt-test rag-testing testing-tools llm-evaluation ai-testing-tool chatbot-testing chatbot-framework testing-framework generative-ai-testing large-language-models summarization-testing llm-evaluation-framework

Created 2024-10-30

134 commits to main branch, last one 4 months ago