Search Results - RepositoryStats

deepeval confident-ai

492

5.8k

apache-2.0

27

The LLM Evaluation Framework

llm-evaluation evaluation-metrics evaluation-framework llm-evaluation-metrics llm-evaluation-framework

Created 2023-08-10

4,638 commits to main branch, last one 10 hours ago

agentops AgentOps-AI

374

4.2k

mit

43

Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including OpenAI Agents SDK, CrewAI, Langchain, Autogen, AG2, and CamelAI

ai llm groq agent evals crewai ollama openai autogen mistral agentops anthropic langchain agents-sdk openai-agents cost-estimation evaluation-metrics

Created 2023-08-15

639 commits to main branch, last one 5 days ago

tiny-universe datawhalechina

275

2.6k

unknown

23

《大模型白盒子构建指南》：一个全手搓的Tiny-Universe

rag qwen agent llama diffusion transformers evaluation-metrics

Created 2024-04-06

138 commits to main branch, last one about a month ago

AB3DMOT xinshuoweng

405

1.7k

other

50

(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"

kitti 3d-mot 3d-multi kitti-3d robotics tracking real-time evaluation 3d-tracking computer-vision machine-learning 2d-mot-evaluation evaluation-metrics multi-object-tracking 3d-multi-object-tracking

Created 2019-06-19

221 commits to master branch, last one about a year ago

lighteval huggingface

211

1.4k

mit

28

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

evaluation huggingface evaluation-metrics evaluation-framework

Created 2024-01-26

351 commits to main branch, last one 11 hours ago

evaluation-guidebook huggingface

70

1.1k

other

10

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

llm tutorial guidebook evaluation machine-learning evaluation-metrics large-language-models

Created 2024-10-09

73 commits to main branch, last one 2 months ago

rliable google-research

48

816

apache-2.0

9

[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.

rl google benchmarking machine-learning evaluation-metrics reinforcement-learning

Created 2021-08-20

72 commits to master branch, last one 7 months ago

OCTIS MIND-Lab

110

753

mit

13

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

nlp nlproc nlp-library topic-models topic-modeling evaluation-metrics neural-topic-models bayesian-optimization hyperparameter-search hyperparameter-tuning latent-semantic-analysis hyperparameter-optimization latent-dirichlet-allocation natural-language-processing non-negative-matrix-factorization

Created 2020-03-13

1,150 commits to master branch, last one 8 months ago

jiwer jitsi

101

704

apache-2.0

15

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

wer python3 speech-to-text word-error-rate evaluation-metrics automatic-speech-recognition

Created 2018-06-19

107 commits to master branch, last one about a month ago

image-similarity-measures nekhtiari

70

609

mit

12

:chart_with_upwards_trend: Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.

p1 image metrics processing machine-learning evaluation-metrics

Created 2020-04-06

98 commits to master branch, last one 7 months ago

COMET Unbabel

88

562

apache-2.0

19

A Neural Framework for MT Evaluation

nlp machine-learning evaluation-metrics machine-translation artificial-intelligence natural-language-processing

Created 2020-05-28

561 commits to master branch, last one 5 days ago

ranx AmenRa

26

534

mit

10

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

numba python comparison evaluation metasearch data-fusion rank-fusion score-fusion ranking-metrics evaluation-metrics recommender-systems information-retrieval information-retrieval-metrics information-retrieval-evaluation

Created 2020-06-02

284 commits to master branch, last one 9 months ago

continuous-eval relari-ai

33

484

apache-2.0

4

Data-Driven Evaluation for LLM-Powered Applications

rag llmops llm-evaluation evaluation-metrics evaluation-framework information-retrieval retrieval-augmented-generation

Created 2023-12-08

106 commits to main branch, last one 2 months ago

pynlpl proycon

67

477

gpl-3.0

30

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks su...

nlp folia python library linguistics nlp-library text-processing machine-learning search-algorithms evaluation-metrics language-modelling computational-linguistics natural-language-processing

Created 2010-07-06

2,160 commits to master branch, last one about a year ago

SpecVQGAN v-iashin

39

358

mit

8

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

gan vas bmvc audio video vqvae melgan pytorch vggsound multi-modal transformer video-features audio-generation evaluation-metrics video-understanding

Created 2021-10-17

24 commits to main branch, last one 8 months ago

Cloud_Map_Evaluation JokerJohn

25

337

unknown

7

[RAL' 2025] MapEval: Towards Unified, Robust and Efficient SLAM Map Evaluation Framework.

slam open3d robotics point-cloud map-evaluation lidar-point-cloud evaluation-metrics slam-benchmarcking wasserstein-distance pointcloud-registration

Created 2022-12-30

69 commits to main branch, last one 9 days ago

factCC salesforce

30

292

bsd-3-clause

8

Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

evaluation-metrics text-summarization

Created 2019-10-23

5 commits to master branch, last one 3 years ago

tonic_validate TonicAI

30

291

mit

14

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

llm rag llms llmops evaluation-metrics evaluation-framework large-language-models retrieval-augmented-generation

Created 2023-10-23

343 commits to main branch, last one 4 months ago

athina-evals athina-ai

17

274

unknown

5

Python SDK for running evaluations on LLM generated responses

llmops llm-ops llm-eval evaluation llm-evaluation evaluation-metrics evaluation-framework llm-evaluation-toolkit

Created 2023-11-22

784 commits to main branch, last one 4 days ago

LRV-Instruction FuxiaoLiu

13

272

bsd-3-clause

12

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

gpt vqa iclr gpt-4 llama llava vicuna vision chatgpt iclr2024 evaluation multimodal hallucination object-detection foundation-models evaluation-metrics prompt-engineering vision-and-language

Created 2023-06-15

366 commits to main branch, last one about a year ago

Awesome-Evaluation-of-Visual-Generation ziqihuangg

14

263

unknown

6

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

awesome benchmark evaluation image-generation video-generation evaluation-system generative-models evaluation-metrics

Created 2024-03-16

93 commits to master branch, last one about a month ago

generative-evaluation-prdc clovaai

28

254

mit

8

Code base for the precision, recall, density, and coverage metrics for generative models. ICML 2020.

icml recall fidelity icml2020 diversity icml-2020 precision evaluation deep-learning generative-model machine-learning evaluation-metrics generative-adversarial-network

Created 2020-02-21

6 commits to master branch, last one 4 years ago

pyrouge bheinzerling

71

251

mit

3

A Python wrapper for the ROUGE summarization evaluation package

nlp rouge summarization evaluation-metrics

Created 2014-01-14

38 commits to master branch, last one 5 years ago

Twitter-Sentiment-Analysis sharmaroshan

124

244

gpl-3.0

3

It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text a...

eda nlp hashtags wordcloud bag-of-words datacleaning data-analysis classification count-vectorizer cross-validation machine-learning data-visualization evaluation-metrics sentiment-analysis

Created 2019-03-31

10 commits to master branch, last one 5 years ago

foundation-model-benchmarking-tool aws-samples

41

232

mit-0

8

Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.

g5 g6 p5 g6e p4d llama2 llama3 bedrock deepseek trainium benchmark sagemaker inferentia deepseek-r1 benchmarking generative-ai foundation-models evaluation-metrics

Created 2024-01-09

1,491 commits to main branch, last one 25 days ago

NER-Evaluation davidsbatista

49

220

mit

10

An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity

ner semeval crfsuite semeval-2013 ner-evaluation notebook-jupyter evaluation-metrics named-entity-recognition

Created 2018-05-17

65 commits to master branch, last one 9 months ago

awesome-diffusion-v2v wenhao728

9

208

mit

5

Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.

survey benchmark video-editing video-to-video diffusion-models evaluation-metrics

Created 2024-06-15

22 commits to main branch, last one 2 months ago

CLEval clovaai

28

185

mit

9

CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks

end-to-end-ocr text-detection text-recognition evaluation-metrics text-detection-recognition

Created 2020-05-27

15 commits to master branch, last one about a year ago

unitxt IBM

52

182

apache-2.0

19

🦄 Unitxt: a python library for getting data fired up and set for training and evaluation

ai llm nlp data mlops python vision datasets evaluation nlp-library evaluation-metrics

Created 2023-06-15

2,723 commits to main branch, last one 5 days ago

PySODEvalToolkit lartpang

21

175

mit

1

PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection

Created 2020-07-08

66 commits to master branch, last one 6 months ago