55 results found Sort:
- Filter by Primary Language:
- Python (47)
- Jupyter Notebook (5)
- JavaScript (1)
- Roff (1)
- +
The LLM Evaluation Framework
Created
2023-08-10
3,019 commits to main branch, last one a day ago
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
Created
2019-06-19
221 commits to master branch, last one about a year ago
Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen
Created
2023-08-15
371 commits to main branch, last one 9 hours ago
[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.
Created
2021-08-20
66 commits to master branch, last one 29 days ago
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
Created
2020-03-13
1,143 commits to master branch, last one 24 days ago
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Created
2018-06-19
87 commits to master branch, last one about a month ago
:chart_with_upwards_trend: Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.
Created
2020-04-06
95 commits to master branch, last one about a month ago
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks su...
Created
2010-07-06
2,160 commits to master branch, last one 9 months ago
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
Created
2024-01-26
126 commits to main branch, last one about a month ago
A Neural Framework for MT Evaluation
Created
2020-05-28
543 commits to master branch, last one about a month ago
⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
Created
2020-06-02
270 commits to master branch, last one 7 months ago
Open-Source Evaluation for GenAI Application Pipelines
Created
2023-12-08
89 commits to main branch, last one 8 days ago
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
Created
2021-10-17
22 commits to main branch, last one about a year ago
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper
Created
2019-10-23
5 commits to master branch, last one 2 years ago
A Python wrapper for the ROUGE summarization evaluation package
Created
2014-01-14
38 commits to master branch, last one 5 years ago
Code base for the precision, recall, density, and coverage metrics for generative models. ICML 2020.
Created
2020-02-21
6 commits to master branch, last one 3 years ago
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Created
2023-06-15
366 commits to main branch, last one 3 months ago
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
Created
2023-10-23
340 commits to main branch, last one 9 days ago
It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text a...
Created
2019-03-31
10 commits to master branch, last one 5 years ago
An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity
Created
2018-05-17
62 commits to master branch, last one 4 months ago
CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks
Created
2020-05-27
15 commits to master branch, last one 8 months ago
Easier Automatic Sentence Simplification Evaluation
Created
2019-03-04
353 commits to master branch, last one 9 months ago
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection
Created
2020-07-08
63 commits to master branch, last one about a month ago
Python SDK for running evaluations on LLM generated responses
Created
2023-11-22
455 commits to main branch, last one a day ago
Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13
Created
2019-02-22
241 commits to main branch, last one 22 days ago
A fast implementation of bss_eval metrics for blind source separation
Created
2021-10-12
67 commits to main branch, last one 2 years ago
Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations.
Created
2022-07-03
32 commits to main branch, last one about a year ago
GOM:New Metric for Re-identification. 👉GOM explicitly balances the effect of performing retrieval and verification into a single unified metric.
Created
2020-07-10
120 commits to master branch, last one about a year ago
Code for "Semantic Object Accuracy for Generative Text-to-Image Synthesis" (TPAMI 2020)
Created
2019-10-21
97 commits to master branch, last one 3 years ago
NeurIPS 2023 - TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models Official Code
Created
2023-09-22
77 commits to main branch, last one 9 months ago