68 results found Sort:
- Filter by Primary Language:
- Python (54)
- Jupyter Notebook (8)
- JavaScript (1)
- C++ (1)
- HTML (1)
- Roff (1)
- TypeScript (1)
- +
The LLM Evaluation Framework
Created
2023-08-10
4,638 commits to main branch, last one 10 hours ago
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including OpenAI Agents SDK, CrewAI, Langchain, Autogen, AG2, and CamelAI
Created
2023-08-15
639 commits to main branch, last one 5 days ago
《大模型白盒子构建指南》:一个全手搓的Tiny-Universe
Created
2024-04-06
138 commits to main branch, last one about a month ago
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
Created
2019-06-19
221 commits to master branch, last one about a year ago
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Created
2024-01-26
351 commits to main branch, last one 11 hours ago
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
Created
2024-10-09
73 commits to main branch, last one 2 months ago
[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.
Created
2021-08-20
72 commits to master branch, last one 7 months ago
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
Created
2020-03-13
1,150 commits to master branch, last one 8 months ago
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Created
2018-06-19
107 commits to master branch, last one about a month ago
:chart_with_upwards_trend: Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.
Created
2020-04-06
98 commits to master branch, last one 7 months ago
A Neural Framework for MT Evaluation
Created
2020-05-28
561 commits to master branch, last one 5 days ago
⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
Created
2020-06-02
284 commits to master branch, last one 9 months ago
Data-Driven Evaluation for LLM-Powered Applications
Created
2023-12-08
106 commits to main branch, last one 2 months ago
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks su...
Created
2010-07-06
2,160 commits to master branch, last one about a year ago
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
Created
2021-10-17
24 commits to main branch, last one 8 months ago
[RAL' 2025] MapEval: Towards Unified, Robust and Efficient SLAM Map Evaluation Framework.
Created
2022-12-30
69 commits to main branch, last one 9 days ago
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper
Created
2019-10-23
5 commits to master branch, last one 3 years ago
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
Created
2023-10-23
343 commits to main branch, last one 4 months ago
Python SDK for running evaluations on LLM generated responses
Created
2023-11-22
784 commits to main branch, last one 4 days ago
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Created
2023-06-15
366 commits to main branch, last one about a year ago
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
Created
2024-03-16
93 commits to master branch, last one about a month ago
Code base for the precision, recall, density, and coverage metrics for generative models. ICML 2020.
Created
2020-02-21
6 commits to master branch, last one 4 years ago
A Python wrapper for the ROUGE summarization evaluation package
Created
2014-01-14
38 commits to master branch, last one 5 years ago
It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text a...
Created
2019-03-31
10 commits to master branch, last one 5 years ago
Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.
Created
2024-01-09
1,491 commits to main branch, last one 25 days ago
An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity
Created
2018-05-17
65 commits to master branch, last one 9 months ago
Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.
Created
2024-06-15
22 commits to main branch, last one 2 months ago
CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks
Created
2020-05-27
15 commits to master branch, last one about a year ago
🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
Created
2023-06-15
2,723 commits to main branch, last one 5 days ago
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection
Created
2020-07-08
66 commits to master branch, last one 6 months ago