36 results found Sort:

A framework for few-shot evaluation of language models.
Created 2020-08-28
3,415 commits to main branch, last one 23 hours ago
223
3.3k
mit
18
Test your prompts, agents, and RAGs. Use LLM evals to improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with comman...
Created 2023-04-28
1,331 commits to main branch, last one 9 hours ago
153
2.2k
apache-2.0
17
The LLM Evaluation Framework
Created 2023-08-10
2,995 commits to main branch, last one 4 hours ago
This is the repository of our article published in RecSys 2019 "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches" and of several follow-up studies.
Created 2019-04-02
63 commits to master branch, last one 2 years ago
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
Created 2024-01-26
126 commits to main branch, last one about a month ago
19
359
apache-2.0
4
Open-Source Evaluation for GenAI Application Pipelines
Created 2023-12-08
90 commits to main branch, last one a day ago
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
Created 2023-10-23
340 commits to main branch, last one 2 days ago
12
213
bsd-3-clause
7
A research library for automating experiments on Deep Graph Networks
Created 2020-03-21
447 commits to main branch, last one a day ago
9
212
mit
8
AI Data Management & Evaluation Platform
This repository has been archived (exclude archived)
Created 2022-02-03
1,044 commits to main branch, last one 8 months ago
Expressive is a cross-platform expression parsing and evaluation framework. The cross-platform nature is achieved through compiling for .NET Standard so it will run on practically any platform.
Created 2016-06-13
288 commits to main branch, last one 27 days ago
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection
Created 2020-07-08
63 commits to master branch, last one about a month ago
Python SDK for running evaluations on LLM generated responses
Created 2023-11-22
445 commits to main branch, last one 2 days ago
Test and evaluate LLMs and model configurations, across all the scenarios that matter for your application
Created 2024-03-14
218 commits to main branch, last one 29 days ago
13
122
apache-2.0
5
Evaluation suite for large-scale language models.
Created 2021-08-05
7 commits to main branch, last one 2 years ago
Moonshot - A simple and modular tool to evaluate and red-team any LLM application.
Created 2023-12-14
987 commits to main branch, last one 10 days ago
LiDAR SLAM comparison and evaluation framework
Created 2021-07-26
17 commits to main branch, last one 2 years ago
The implementation of the paper "Evaluating Coherence in Dialogue Systems using Entailment"
Created 2019-04-13
9 commits to master branch, last one 2 years ago
This repository allows you to evaluate a trained computer vision model and get general information and evaluation metrics with little configuration.
Created 2022-10-28
10 commits to main branch, last one 7 months ago
Multilingual Large Language Models Evaluation Benchmark
Created 2023-08-07
18 commits to main branch, last one 10 months ago
Vectory provides a collection of tools to track and compare embedding versions.
Created 2022-09-30
45 commits to main branch, last one about a year ago
Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中
Created 2023-09-28
27 commits to master branch, last one 5 months ago
OD-test: A Less Biased Evaluation of Out-of-Distribution (Outlier) Detectors (PyTorch)
Created 2018-09-12
40 commits to master branch, last one 4 years ago
Power Flows DMN - Powerful decisions and rules engine
Created 2018-08-03
253 commits to master branch, last one 5 years ago
DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
Created 2024-03-28
375 commits to main branch, last one 2 days ago
Python-based tools for pre-, post-processing, validating, and curating spike sorting datasets.
This repository has been archived (exclude archived)
Created 2018-09-28
1,509 commits to master branch, last one 2 years ago
2
48
apache-2.0
5
Evaluation framework for oncology foundation models (FMs)
Created 2024-01-16
228 commits to main branch, last one 2 days ago
1
44
apache-2.0
6
Simulator for training and evaluation of Recommender Systems
Created 2022-11-28
29 commits to main branch, last one 8 months ago
Framework to evaluate Trajectory Classification Algorithms
Created 2022-10-01
158 commits to main branch, last one 9 months ago