23 results found Sort:
- Filter by Primary Language:
- Python (13)
- TypeScript (3)
- Jupyter Notebook (2)
- Lean (1)
- +
🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Created
2023-05-18
2,062 commits to main branch, last one 12 hours ago
🐢 Open-Source Evaluation & Testing for LLMs and ML models
Created
2022-03-06
9,731 commits to main branch, last one a day ago
Test your prompts, agents, and RAGs. Use LLM evals to improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with comman...
Created
2023-04-28
1,361 commits to main branch, last one 20 hours ago
The LLM Evaluation Framework
Created
2023-08-10
3,019 commits to main branch, last one a day ago
The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
Created
2023-04-26
7,686 commits to main branch, last one 22 hours ago
Open-Source Evaluation for GenAI Application Pipelines
Created
2023-12-08
89 commits to main branch, last one 8 days ago
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
Created
2023-04-26
237 commits to main branch, last one 7 days ago
Awesome papers involving LLMs in Social Science.
Created
2023-10-15
81 commits to main branch, last one 10 days ago
The official evaluation suite and dynamic data release for MixEval.
Created
2024-06-01
35 commits to main branch, last one 7 days ago
Python SDK for running evaluations on LLM generated responses
Created
2023-11-22
455 commits to main branch, last one a day ago
Prompty makes it easy to create, manage, debug, and evaluate LLM prompts for your AI applications. Prompty is an asset class and format for LLM prompts designed to enhance observability, understandab...
Created
2024-04-22
93 commits to main branch, last one 2 days ago
Superpipe - optimized LLM pipelines for structured data
Created
2024-02-07
99 commits to main branch, last one 8 days ago
Framework for LLM evaluation, guardrails and security
Created
2024-03-02
5 commits to main branch, last one 27 days ago
Evaluating LLMs with CommonGen-Lite
Created
2024-01-04
37 commits to main branch, last one 5 months ago
A list of LLMs Tools & Projects
Created
2023-05-09
49 commits to main branch, last one a day ago
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Created
2023-11-19
40 commits to main branch, last one 4 months ago
LeanEuclid is a benchmark for autoformalization in the domain of Euclidean geometry, targeting the proof assistant Lean.
Created
2024-05-27
14 commits to master branch, last one 26 days ago
Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.
Created
2023-11-15
3 commits to main branch, last one 7 months ago
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute, relative and much more. It contains a list of all the availab...
Created
2024-05-11
29 commits to main branch, last one 26 days ago
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Created
2023-07-24
925 commits to main branch, last one 6 days ago
This repository has no description...
Created
2024-06-06
422 commits to main branch, last one 2 days ago
A collection of hand on notebook for LLMs practitioner
Created
2024-04-04
51 commits to main branch, last one 27 days ago
The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“
Created
2023-10-12
12 commits to main branch, last one 5 months ago