61 results found Sort:
- Filter by Primary Language:
- Python (49)
- Jupyter Notebook (7)
- HTML (1)
- JavaScript (1)
- Roff (1)
- +
The LLM Evaluation Framework
Created
2023-08-10
3,764 commits to main branch, last one 17 hours ago
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen
Created
2023-08-15
518 commits to main branch, last one 2 days ago
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
Created
2019-06-19
221 commits to master branch, last one about a year ago
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Created
2024-01-26
227 commits to main branch, last one a day ago
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
Created
2024-10-09
67 commits to main branch, last one 9 days ago
[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.
Created
2021-08-20
72 commits to master branch, last one 3 months ago
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
Created
2020-03-13
1,150 commits to master branch, last one 3 months ago
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Created
2018-06-19
90 commits to master branch, last one 13 days ago
:chart_with_upwards_trend: Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.
Created
2020-04-06
98 commits to master branch, last one 3 months ago
A Neural Framework for MT Evaluation
Created
2020-05-28
547 commits to master branch, last one 4 months ago
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks su...
Created
2010-07-06
2,160 commits to master branch, last one about a year ago
⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
Created
2020-06-02
284 commits to master branch, last one 4 months ago
Data-Driven Evaluation for LLM-Powered Applications
Created
2023-12-08
96 commits to main branch, last one 2 months ago
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
Created
2021-10-17
24 commits to main branch, last one 4 months ago
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper
Created
2019-10-23
5 commits to master branch, last one 3 years ago
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
Created
2023-10-23
342 commits to main branch, last one 4 months ago
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Created
2023-06-15
366 commits to main branch, last one 8 months ago
A Python wrapper for the ROUGE summarization evaluation package
Created
2014-01-14
38 commits to master branch, last one 5 years ago
Code base for the precision, recall, density, and coverage metrics for generative models. ICML 2020.
Created
2020-02-21
6 commits to master branch, last one 4 years ago
It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text a...
Created
2019-03-31
10 commits to master branch, last one 5 years ago
Python SDK for running evaluations on LLM generated responses
Created
2023-11-22
568 commits to main branch, last one a day ago
An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity
Created
2018-05-17
65 commits to master branch, last one 4 months ago
Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.
Created
2024-01-09
1,145 commits to main branch, last one 19 hours ago
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
Created
2024-03-16
80 commits to master branch, last one about a month ago
CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks
Created
2020-05-27
15 commits to master branch, last one about a year ago
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection
Created
2020-07-08
66 commits to master branch, last one about a month ago
🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
Created
2023-06-15
2,435 commits to main branch, last one 21 hours ago
Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13
Created
2019-02-22
255 commits to main branch, last one 9 days ago
Easier Automatic Sentence Simplification Evaluation
Created
2019-03-04
353 commits to master branch, last one about a year ago
Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.
Created
2024-06-15
15 commits to main branch, last one about a month ago