64 results found Sort:

327
4.0k
apache-2.0
23
The LLM Evaluation Framework
Created 2023-08-10
3,938 commits to main branch, last one 20 hours ago
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen
Created 2023-08-15
554 commits to main branch, last one a day ago
403
1.7k
other
49
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
Created 2019-06-19
221 commits to master branch, last one about a year ago
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
Created 2024-10-09
72 commits to main branch, last one 14 days ago
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Created 2024-01-26
266 commits to main branch, last one 21 hours ago
48
785
apache-2.0
11
[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.
Created 2021-08-20
72 commits to master branch, last one 4 months ago
106
737
mit
15
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
Created 2020-03-13
1,150 commits to master branch, last one 4 months ago
98
653
apache-2.0
15
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Created 2018-06-19
90 commits to master branch, last one about a month ago
:chart_with_upwards_trend: Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.
Created 2020-04-06
98 commits to master branch, last one 4 months ago
82
515
apache-2.0
20
A Neural Framework for MT Evaluation
Created 2020-05-28
554 commits to master branch, last one 12 days ago
27
496
mit
11
⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
Created 2020-06-02
284 commits to master branch, last one 5 months ago
68
479
gpl-3.0
31
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks su...
Created 2010-07-06
2,160 commits to master branch, last one about a year ago
31
455
apache-2.0
4
Data-Driven Evaluation for LLM-Powered Applications
Created 2023-12-08
96 commits to main branch, last one 3 months ago
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
Created 2021-10-17
24 commits to main branch, last one 5 months ago
31
289
bsd-3-clause
10
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper
Created 2019-10-23
5 commits to master branch, last one 3 years ago
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
Created 2023-10-23
343 commits to main branch, last one about a month ago
13
262
bsd-3-clause
11
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Created 2023-06-15
366 commits to main branch, last one 9 months ago
A Python wrapper for the ROUGE summarization evaluation package
Created 2014-01-14
38 commits to master branch, last one 5 years ago
Code base for the precision, recall, density, and coverage metrics for generative models. ICML 2020.
Created 2020-02-21
6 commits to master branch, last one 4 years ago
Python SDK for running evaluations on LLM generated responses
Created 2023-11-22
635 commits to main branch, last one a day ago
It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text a...
Created 2019-03-31
10 commits to master branch, last one 5 years ago
An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity
Created 2018-05-17
65 commits to master branch, last one 5 months ago
Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.
Created 2024-01-09
1,315 commits to main branch, last one 9 hours ago
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
Created 2024-03-16
80 commits to master branch, last one 3 months ago
28
184
mit
10
CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks
Created 2020-05-27
15 commits to master branch, last one about a year ago
Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.
Created 2024-06-15
20 commits to main branch, last one 20 days ago
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection
Created 2020-07-08
66 commits to master branch, last one 2 months ago
47
164
apache-2.0
18
🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
Created 2023-06-15
2,508 commits to main branch, last one a day ago
Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13
Created 2019-02-22
255 commits to main branch, last one about a month ago
36
159
gpl-3.0
5
Easier Automatic Sentence Simplification Evaluation
Created 2019-03-04
353 commits to master branch, last one about a year ago