47 results found Sort:

679
7.4k
other
28
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Created 2023-05-18
3,257 commits to main branch, last one 22 hours ago
405
5.0k
mit
21
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...
Created 2023-04-28
3,165 commits to main branch, last one 12 hours ago
330
4.1k
apache-2.0
23
The LLM Evaluation Framework
Created 2023-08-10
3,948 commits to main branch, last one a day ago
265
3.1k
apache-2.0
30
the LLM vulnerability scanner
Created 2023-05-10
1,492 commits to main branch, last one a day ago
228
3.0k
apache-2.0
23
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
Created 2024-01-10
837 commits to main branch, last one 2 days ago
270
2.5k
apache-2.0
14
🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓
Created 2023-01-31
3,088 commits to main branch, last one 17 hours ago
The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices
Created 2024-04-09
170 commits to main branch, last one 21 days ago
215
1.6k
mit
22
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place.
Created 2023-04-26
10,700 commits to main branch, last one 2 days ago
70
1.4k
apache-2.0
5
Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.
Created 2024-08-29
257 commits to main branch, last one 11 hours ago
Prompty makes it easy to create, manage, debug, and evaluate LLM prompts for your AI applications. Prompty is an asset class and format for LLM prompts designed to enhance observability, understandab...
Created 2024-04-22
297 commits to main branch, last one 7 days ago
31
456
apache-2.0
4
Data-Driven Evaluation for LLM-Powered Applications
Created 2023-12-08
96 commits to main branch, last one 3 months ago
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
Created 2023-04-26
263 commits to main branch, last one about a month ago
a curated list of 🌌 Azure OpenAI, 🦙Large Language Models, and references with notes.
Created 2023-04-13
175 commits to main branch, last one 3 days ago
Build, Improve Performance, and Productionize your LLM Application with an Integrated Framework
Created 2024-03-12
419 commits to main branch, last one 24 days ago
Awesome papers involving LLMs in Social Science.
Created 2023-10-15
132 commits to main branch, last one a day ago
Python SDK for running evaluations on LLM generated responses
Created 2023-11-22
639 commits to main branch, last one 3 days ago
37
229
unknown
1
The official evaluation suite and dynamic data release for MixEval.
Created 2024-06-01
120 commits to main branch, last one about a month ago
Connect agents to live web environments evaluation.
Created 2024-06-06
507 commits to main branch, last one 6 days ago
24
157
apache-2.0
2
A list of LLMs Tools & Projects
Created 2023-05-09
63 commits to main branch, last one 26 days ago
Superpipe - optimized LLM pipelines for structured data
Created 2024-02-07
99 commits to main branch, last one 6 months ago
Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.
Created 2023-11-15
7 commits to main branch, last one 3 months ago
7
101
apache-2.0
7
Rank LLMs, RAG systems, and prompts using automated head-to-head evaluation
Created 2024-08-28
137 commits to trunk branch, last one 2 months ago
Framework for LLM evaluation, guardrails and security
Created 2024-03-02
9 commits to main branch, last one 3 months ago
3
87
apache-2.0
6
Evaluating LLMs with CommonGen-Lite
Created 2024-01-04
37 commits to main branch, last one 11 months ago
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
Created 2024-05-29
78 commits to main branch, last one 11 days ago
LeanEuclid is a benchmark for autoformalization in the domain of Euclidean geometry, targeting the proof assistant Lean.
Created 2024-05-27
14 commits to master branch, last one 6 months ago
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Created 2023-11-19
40 commits to main branch, last one 10 months ago
6
75
apache-2.0
2
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Created 2023-07-24
1,067 commits to main branch, last one 3 months ago
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessmen...
Created 2024-04-02
247 commits to main branch, last one 7 days ago