Search Results - RepositoryStats

438

5.4k

mit

22

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...

ci llm rag cicd ci-cd llmops prompts testing llm-eval evaluation pentesting red-teaming llm-evaluation prompt-testing prompt-engineering evaluation-framework vulnerability-scanners llm-evaluation-framework

Created 2023-04-28

3,644 commits to main branch, last one 4 hours ago

phoenix Arize-ai

342

4.6k

other

32

AI Observability & Evaluation

llms evals agents llmops openai datasets llm-eval anthropic langchain llamaindex smolagents ai-monitoring aiengineering llm-evaluation ai-observability prompt-engineering

Created 2022-11-09

4,134 commits to main branch, last one a day ago

giskard Giskard-AI

291

4.3k

apache-2.0

33

🐢 Open-Source Evaluation & Testing for AI & LLM systems

Created 2022-03-06

10,216 commits to main branch, last one 3 days ago

datachain iterative

100

2.3k

apache-2.0

16

ETL, Analytics, Versioning for Unstructured Data

ai cv llm mlops llm-eval embeddings multimodal data-analytics data-wrangling machine-learning

Created 2024-06-25

526 commits to main branch, last one 2 days ago

uptrain uptrain-ai

198

2.2k

apache-2.0

20

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...

llmops llm-eval llm-test evaluation monitoring openai-evals llm-prompting autoevaluation experimentation machine-learning prompt-engineering jailbreak-detection root-cause-analysis hallucination-detection

Created 2022-11-07

770 commits to main branch, last one 6 months ago

athina-evals athina-ai

16

259

unknown

5

Python SDK for running evaluations on LLM generated responses

llmops llm-ops llm-eval evaluation llm-evaluation evaluation-metrics evaluation-framework llm-evaluation-toolkit

Created 2023-11-22

718 commits to main branch, last one a day ago

fiddlecube-sdk fiddlecube

3

126

unknown

1

Generate ideal question-answers for testing RAG

llm-eval llm-training fine-tune-llms synthetic-data

This repository has been archived (exclude archived)

Created 2023-07-04

60 commits to master branch, last one about a month ago

just-eval Re-Align

6

82

mit

3

A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

llm gpt4 llm-eval evaluation llm-evaluation llm-evaluation-toolkit

Created 2023-11-19

40 commits to main branch, last one about a year ago

parea-sdk-py parea-ai

6

73

apache-2.0

1

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

llm llmops metrics llm-eval llm-tools generative-ai llm-evaluation good-first-issue llms-benchmarking prompt-engineering llm-evaluation-toolkit llm-evaluation-framework

Created 2023-07-24

1,087 commits to main branch, last one 4 days ago

rulm-sbs2 kuk

2

59

unknown

3

Бенчмарк сравнивает русские аналоги ChatGPT: Saiga, YandexGPT, Gigachat

llm-eval russian-specific

Created 2023-08-23

112 commits to master branch, last one about a year ago

ragrank Auto-Playground

12

32

apache-2.0

0

🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.

llm rag llmops llm-eval evaluation language-model machine-learning prompt-engineering

Created 2024-02-17

281 commits to main branch, last one 26 days ago

prompto alan-turing-institute

1

25

mit

16

An open source library for asynchronous querying of LLM endpoints

nlp llms hut23 python llm-eval transformer transformers deep-learning llm-evaluation machine-learning large-language-models natural-language-processing

Created 2024-04-03

534 commits to main branch, last one 3 days ago