Search Results - RepositoryStats

965

10.7k

other

34

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

llm llmops openai autogen analytics langchain evaluation monitoring playground llama-index open-source self-hosted ycombinator observability llm-evaluation llm-observability prompt-management prompt-engineering large-language-models

Created 2023-05-18

4,030 commits to main branch, last one 14 hours ago

opik comet-ml

488

6.8k

apache-2.0

62

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

llm llmops openai langchain playground llama-index open-source llm-evaluation llm-observability prompt-engineering

Created 2023-05-10

1,602 commits to main branch, last one 10 hours ago

promptfoo promptfoo

514

6.3k

mit

20

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...

ci llm rag cicd ci-cd llmops prompts testing llm-eval evaluation pentesting red-teaming llm-evaluation prompt-testing prompt-engineering evaluation-framework vulnerability-scanners llm-evaluation-framework

Created 2023-04-28

4,238 commits to main branch, last one 21 hours ago

deepeval confident-ai

531

6.1k

apache-2.0

27

The LLM Evaluation Framework

llm-evaluation evaluation-metrics evaluation-framework llm-evaluation-metrics llm-evaluation-framework

Created 2023-08-10

4,806 commits to main branch, last one 14 hours ago

phoenix Arize-ai

403

5.5k

other

37

AI Observability & Evaluation

llms evals agents llmops openai datasets llm-eval anthropic langchain llamaindex smolagents ai-monitoring aiengineering llm-evaluation ai-observability prompt-engineering

Created 2022-11-09

4,909 commits to main branch, last one 24 hours ago

giskard Giskard-AI

318

4.5k

apache-2.0

35

🐢 Open-Source Evaluation & Testing for AI & LLM systems

llm mlops llmops llm-eval ai-testing ml-testing ai-red-team ai-security fairness-ai llm-security ml-validation llm-evaluation rag-evaluation red-team-tools responsible-ai trustworthy-ai agent-evaluation

Created 2022-03-06

10,306 commits to main branch, last one 2 days ago

garak NVIDIA

426

4.3k

apache-2.0

40

the LLM vulnerability scanner

ai llm-security llm-evaluation security-scanners vulnerability-assessment

Created 2023-05-10

1,924 commits to main branch, last one 11 hours ago

AutoRAG Marker-Inc-Korea

304

3.9k

apache-2.0

32

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

qa llm ops rag automl python llm-ops analysis pipeline embeddings evaluation open-source benchmarking optimization llm-evaluation rag-evaluation document-parser retrieval-augmented-generation

Created 2024-01-10

849 commits to main branch, last one 18 hours ago

helicone Helicone

363

3.6k

apache-2.0

22

🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓

gpt llm llmops openai llm-cost analytics langchain evaluation monitoring playground llama-index open-source ycombinator llm-evaluation agent-monitoring llm-observability prompt-management prompt-engineering large-language-models

Created 2023-01-31

3,747 commits to main branch, last one 18 hours ago

LLM-Engineers-Handbook PacktPublishing

658

3.2k

mit

44

The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices

aws llm rag genai mlops llmops llm-evaluation fine-tuning-llm ml-system-design

Created 2024-04-09

199 commits to main branch, last one about a month ago

agenta Agenta-AI

308

2.6k

mit

29

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

llm-tools llm-platform llm-framework llm-as-a-judge llm-evaluation llm-monitoring llm-playground rag-evaluation llmops-platform llm-observability prompt-management prompt-engineering

Created 2023-04-26

12,083 commits to main branch, last one 2 days ago

lmnr lmnr-ai

113

1.9k

apache-2.0

11

Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.

Created 2024-08-29

540 commits to main branch, last one a day ago

agentic_security msoedov

206

1.3k

apache-2.0

18

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

llm-fuzzer ai-red-team llm-fuzzing llm-scanner llm-security agent-security llm-evaluation llm-guardrails llm-jailbreaks prompt-testing agent-framework llm-vulnerabilities llm-fuzzer-aggregator llm-evaluation-framework

Created 2024-04-11

581 commits to main branch, last one 15 days ago

prompty microsoft

83

860

mit

18

Prompty makes it easy to create, manage, debug, and evaluate LLM prompts for your AI applications. Prompty is an asset class and format for LLM prompts designed to enhance observability, understandab...

llms prompty generative-ai llm-evaluation promptengineering

Created 2024-04-22

451 commits to main branch, last one 10 hours ago

FuzzyAI cyberark

53

523

apache-2.0

9

A powerful tool for automated LLM fuzzing. It is designed to help developers and security researchers identify and mitigate potential jailbreaks in their LLM APIs.

ai llm llms fuzzing security jailbreak ai-read-team jailbreaking llm-security llm-evaluation

Created 2024-12-03

187 commits to main branch, last one a day ago

Awesome-LLM-Eval onejune2018

44

514

mit

8

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表，主要面向基础大模型评测，旨在探求生成式AI的技术边界.

llm nlp rag bert gpt3 qwen llama openai chatglm chatgpt dataset benchmark evaluation awsome-list leaderboard awsome-lists llm-evaluation machine-learning large-language-model

Created 2023-04-26

263 commits to main branch, last one 6 months ago

continuous-eval relari-ai

34

489

apache-2.0

4

Data-Driven Evaluation for LLM-Powered Applications

rag llmops llm-evaluation evaluation-metrics evaluation-framework information-retrieval retrieval-augmented-generation

Created 2023-12-08

106 commits to main branch, last one 3 months ago

Awesome-LLM-in-Social-Science Value4AI

29

429

mit

11

Awesome papers involving LLMs in Social Science.

llms policy alignment economics llm-agent psychology llm-evaluation social-network social-science large-language-models simulation-environment

Created 2023-10-15

161 commits to main branch, last one 3 days ago

awesome-azure-openai-llm kimtth

44

355

unknown

9

A curated list of 🌌 Azure OpenAI, 🦙 Large Language Models (incl. RAG, Agent), and references with memos.

gpt llm rag agent openai awesome chatgpt copilot langchain llm-agent llama-index awesome-list azure-openai llm-evaluation semantic-kernel prompt-engineering

Created 2023-04-13

190 commits to main branch, last one a day ago

palico-ai palico-ai

27

339

mit

3

Build, Improve Performance, and Productionize your LLM Application with an Integrated Framework

ai llm rag docker nodejs openai autogen portkey anthropic langchain llm-agent full-stack javascript llamaindex typescript langchain-js llm-framework llm-evaluation llm-observability

Created 2024-03-12

419 commits to main branch, last one 4 months ago

athina-evals athina-ai

17

277

unknown

5

Python SDK for running evaluations on LLM generated responses

llmops llm-ops llm-eval evaluation llm-evaluation evaluation-metrics evaluation-framework llm-evaluation-toolkit

Created 2023-11-22

791 commits to main branch, last one 9 days ago

WebCanvas iMeanAI

17

243

mit

5

All-in-one Web Agent framework for post-training. Start building with a few clicks!

agent llm-agent llm-evaluation benchmark-framework

Created 2024-06-06

526 commits to main branch, last one 2 months ago

MixEval JinjieNi

41

237

unknown

1

The official evaluation suite and dynamic data release for MixEval.

mixeval benchmark evaluation llm-inference llm-evaluation benchmark-mixture foundation-models benchmarking-suite evaluation-framework large-language-model large-language-models benchmarking-framework large-multimodal-models llm-evaluation-framework

Created 2024-06-01

120 commits to main branch, last one 5 months ago

llms-tools PetroIvaniuk

26

227

apache-2.0

5

A list of LLMs Tools & Projects

ai llm chatgpt chat-bot chatbots data-science llm-evaluation open-source-llm machine-learning

Created 2023-05-09

70 commits to main branch, last one 11 days ago

langfair cvs-health

32

201

other

6

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

ai llm bias python fairness ai-safety ethical-ai fairness-ai fairness-ml bias-detection llm-evaluation responsible-ai fairness-testing large-language-models llm-evaluation-metrics artificial-intelligence llm-evaluation-framework

Created 2024-09-20

319 commits to main branch, last one 3 days ago

LLMStats JonathanChavezTamales

17

185

other

7

A comprehensive set of LLM benchmark scores and provider prices.

llm llmops llm-agents llm-evaluation llms-benchmarking

Created 2024-09-07

87 commits to main branch, last one about a month ago

LLMEvaluation alopatenko

9

114

unknown

7

A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessmen...

llm evaluation llm-evaluation llm-benchmarking generative-ai-benchmarking

Created 2024-04-02

393 commits to main branch, last one 2 days ago