23 results found Sort:

396
4.4k
other
18
🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Created 2023-05-18
2,062 commits to main branch, last one 12 hours ago
228
3.7k
apache-2.0
29
🐢 Open-Source Evaluation & Testing for LLMs and ML models
Created 2022-03-06
9,731 commits to main branch, last one a day ago
230
3.4k
mit
18
Test your prompts, agents, and RAGs. Use LLM evals to improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with comman...
Created 2023-04-28
1,361 commits to main branch, last one 20 hours ago
159
2.3k
apache-2.0
16
The LLM Evaluation Framework
Created 2023-08-10
3,019 commits to main branch, last one a day ago
161
968
mit
20
The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
Created 2023-04-26
7,686 commits to main branch, last one 22 hours ago
20
362
apache-2.0
4
Open-Source Evaluation for GenAI Application Pipelines
Created 2023-12-08
89 commits to main branch, last one 8 days ago
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
Created 2023-04-26
237 commits to main branch, last one 7 days ago
Awesome papers involving LLMs in Social Science.
Created 2023-10-15
81 commits to main branch, last one 10 days ago
Python SDK for running evaluations on LLM generated responses
Created 2023-11-22
455 commits to main branch, last one a day ago
Prompty makes it easy to create, manage, debug, and evaluate LLM prompts for your AI applications. Prompty is an asset class and format for LLM prompts designed to enhance observability, understandab...
Created 2024-04-22
93 commits to main branch, last one 2 days ago
Superpipe - optimized LLM pipelines for structured data
Created 2024-02-07
99 commits to main branch, last one 8 days ago
Framework for LLM evaluation, guardrails and security
Created 2024-03-02
5 commits to main branch, last one 27 days ago
3
83
apache-2.0
6
Evaluating LLMs with CommonGen-Lite
Created 2024-01-04
37 commits to main branch, last one 5 months ago
13
80
apache-2.0
2
A list of LLMs Tools & Projects
Created 2023-05-09
49 commits to main branch, last one a day ago
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Created 2023-11-19
40 commits to main branch, last one 4 months ago
LeanEuclid is a benchmark for autoformalization in the domain of Euclidean geometry, targeting the proof assistant Lean.
Created 2024-05-27
14 commits to master branch, last one 26 days ago
Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.
Created 2023-11-15
3 commits to main branch, last one 7 months ago
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute, relative and much more. It contains a list of all the availab...
Created 2024-05-11
29 commits to main branch, last one 26 days ago
4
43
apache-2.0
2
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Created 2023-07-24
925 commits to main branch, last one 6 days ago
This repository has no description...
Created 2024-06-06
422 commits to main branch, last one 2 days ago
A collection of hand on notebook for LLMs practitioner
Created 2024-04-04
51 commits to main branch, last one 27 days ago
1
28
unknown
2
The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“
Created 2023-10-12
12 commits to main branch, last one 5 months ago