47 results found Sort:
- Filter by Primary Language:
- Python (30)
- TypeScript (6)
- Jupyter Notebook (4)
- HTML (1)
- Lean (1)
- +
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Created
2023-05-18
3,257 commits to main branch, last one 22 hours ago
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...
Created
2023-04-28
3,165 commits to main branch, last one 12 hours ago
🐢 Open-Source Evaluation & Testing for AI & LLM systems
Created
2022-03-06
10,170 commits to main branch, last one 2 days ago
The LLM Evaluation Framework
Created
2023-08-10
3,948 commits to main branch, last one a day ago
the LLM vulnerability scanner
Created
2023-05-10
1,492 commits to main branch, last one a day ago
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
Created
2024-01-10
837 commits to main branch, last one 2 days ago
🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓
Created
2023-01-31
3,088 commits to main branch, last one 17 hours ago
The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices
Created
2024-04-09
170 commits to main branch, last one 21 days ago
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place.
Created
2023-04-26
10,700 commits to main branch, last one 2 days ago
Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.
Created
2024-08-29
257 commits to main branch, last one 11 hours ago
Prompty makes it easy to create, manage, debug, and evaluate LLM prompts for your AI applications. Prompty is an asset class and format for LLM prompts designed to enhance observability, understandab...
Created
2024-04-22
297 commits to main branch, last one 7 days ago
Data-Driven Evaluation for LLM-Powered Applications
Created
2023-12-08
96 commits to main branch, last one 3 months ago
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
Created
2023-04-26
263 commits to main branch, last one about a month ago
a curated list of 🌌 Azure OpenAI, 🦙Large Language Models, and references with notes.
Created
2023-04-13
175 commits to main branch, last one 3 days ago
Build, Improve Performance, and Productionize your LLM Application with an Integrated Framework
Created
2024-03-12
419 commits to main branch, last one 24 days ago
Awesome papers involving LLMs in Social Science.
Created
2023-10-15
132 commits to main branch, last one a day ago
Python SDK for running evaluations on LLM generated responses
Created
2023-11-22
639 commits to main branch, last one 3 days ago
The official evaluation suite and dynamic data release for MixEval.
Created
2024-06-01
120 commits to main branch, last one about a month ago
Connect agents to live web environments evaluation.
Created
2024-06-06
507 commits to main branch, last one 6 days ago
A list of LLMs Tools & Projects
Created
2023-05-09
63 commits to main branch, last one 26 days ago
Superpipe - optimized LLM pipelines for structured data
Created
2024-02-07
99 commits to main branch, last one 6 months ago
Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.
Created
2023-11-15
7 commits to main branch, last one 3 months ago
Rank LLMs, RAG systems, and prompts using automated head-to-head evaluation
Created
2024-08-28
137 commits to trunk branch, last one 2 months ago
Framework for LLM evaluation, guardrails and security
Created
2024-03-02
9 commits to main branch, last one 3 months ago
Evaluating LLMs with CommonGen-Lite
Created
2024-01-04
37 commits to main branch, last one 11 months ago
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
Created
2024-05-29
78 commits to main branch, last one 11 days ago
LeanEuclid is a benchmark for autoformalization in the domain of Euclidean geometry, targeting the proof assistant Lean.
Created
2024-05-27
14 commits to master branch, last one 6 months ago
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Created
2023-11-19
40 commits to main branch, last one 10 months ago
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Created
2023-07-24
1,067 commits to main branch, last one 3 months ago
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessmen...
Created
2024-04-02
247 commits to main branch, last one 7 days ago