55 results found Sort:

159
2.3k
apache-2.0
16
The LLM Evaluation Framework
Created 2023-08-10
3,019 commits to main branch, last one a day ago
401
1.6k
other
48
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
Created 2019-06-19
221 commits to master branch, last one about a year ago
Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen
Created 2023-08-15
371 commits to main branch, last one 9 hours ago
43
720
apache-2.0
11
[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.
Created 2021-08-20
66 commits to master branch, last one 29 days ago
97
700
mit
14
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
Created 2020-03-13
1,143 commits to master branch, last one 24 days ago
92
566
apache-2.0
15
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Created 2018-06-19
87 commits to master branch, last one about a month ago
:chart_with_upwards_trend: Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.
Created 2020-04-06
95 commits to master branch, last one about a month ago
67
477
gpl-3.0
31
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks su...
Created 2010-07-06
2,160 commits to master branch, last one 9 months ago
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
Created 2024-01-26
126 commits to main branch, last one about a month ago
71
432
apache-2.0
17
A Neural Framework for MT Evaluation
Created 2020-05-28
543 commits to master branch, last one about a month ago
22
372
mit
11
⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
Created 2020-06-02
270 commits to master branch, last one 7 months ago
20
362
apache-2.0
4
Open-Source Evaluation for GenAI Application Pipelines
Created 2023-12-08
89 commits to main branch, last one 8 days ago
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
Created 2021-10-17
22 commits to main branch, last one about a year ago
31
270
bsd-3-clause
10
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper
Created 2019-10-23
5 commits to master branch, last one 2 years ago
A Python wrapper for the ROUGE summarization evaluation package
Created 2014-01-14
38 commits to master branch, last one 5 years ago
Code base for the precision, recall, density, and coverage metrics for generative models. ICML 2020.
Created 2020-02-21
6 commits to master branch, last one 3 years ago
14
230
bsd-3-clause
11
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Created 2023-06-15
366 commits to main branch, last one 3 months ago
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
Created 2023-10-23
340 commits to main branch, last one 9 days ago
It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text a...
Created 2019-03-31
10 commits to master branch, last one 5 years ago
An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity
Created 2018-05-17
62 commits to master branch, last one 4 months ago
28
184
mit
10
CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks
Created 2020-05-27
15 commits to master branch, last one 8 months ago
36
154
gpl-3.0
6
Easier Automatic Sentence Simplification Evaluation
Created 2019-03-04
353 commits to master branch, last one 9 months ago
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection
Created 2020-07-08
63 commits to master branch, last one about a month ago
Python SDK for running evaluations on LLM generated responses
Created 2023-11-22
455 commits to main branch, last one a day ago
Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13
Created 2019-02-22
241 commits to main branch, last one 22 days ago
A fast implementation of bss_eval metrics for blind source separation
Created 2021-10-12
67 commits to main branch, last one 2 years ago
Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations.
Created 2022-07-03
32 commits to main branch, last one about a year ago
GOM:New Metric for Re-identification. 👉GOM explicitly balances the effect of performing retrieval and verification into a single unified metric.
Created 2020-07-10
120 commits to master branch, last one about a year ago
Code for "Semantic Object Accuracy for Generative Text-to-Image Synthesis" (TPAMI 2020)
Created 2019-10-21
97 commits to master branch, last one 3 years ago
NeurIPS 2023 - TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models Official Code
Created 2023-09-22
77 commits to main branch, last one 9 months ago