46 results found Sort:
- Filter by Primary Language:
- Python (37)
- TypeScript (2)
- Jupyter Notebook (2)
- JavaScript (1)
- C# (1)
- Go (1)
- Svelte (1)
- Java (1)
- +
A framework for few-shot evaluation of language models.
Created
2020-08-28
3,624 commits to main branch, last one a day ago
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...
Created
2023-04-28
3,610 commits to main branch, last one 7 hours ago
The LLM Evaluation Framework
Created
2023-08-10
4,223 commits to main branch, last one 10 hours ago
🐢 Open-Source Evaluation & Testing for AI & LLM systems
Created
2022-03-06
10,216 commits to main branch, last one 10 hours ago
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Created
2024-01-26
304 commits to main branch, last one 5 hours ago
This is the repository of our article published in RecSys 2019 "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches" and of several follow-up studies.
bpr
knn
bprmf
bprslim
funksvd
deep-learning
neural-network
slimelasticnet
hyperparameters
reproducibility
matrix-completion
recommender-system
evaluation-framework
matrix-factorization
recommendation-system
reproducible-research
collaborative-filtering
hybrid-recommender-system
recommendation-algorithms
content-based-recommendation
Created
2019-04-02
63 commits to master branch, last one 3 years ago
Data-Driven Evaluation for LLM-Powered Applications
Created
2023-12-08
106 commits to main branch, last one 7 days ago
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
Created
2023-10-23
343 commits to main branch, last one 2 months ago
Python SDK for running evaluations on LLM generated responses
Created
2023-11-22
706 commits to main branch, last one 3 days ago
The official evaluation suite and dynamic data release for MixEval.
Created
2024-06-01
120 commits to main branch, last one 2 months ago
A research library for automating experiments on Deep Graph Networks
Created
2020-03-21
461 commits to main branch, last one 4 months ago
AI Data Management & Evaluation Platform
This repository has been archived
(exclude archived)
Created
2022-02-03
1,044 commits to main branch, last one about a year ago
AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility.
Created
2024-05-21
519 commits to main branch, last one 2 days ago
Moonshot - A simple and modular tool to evaluate and red-team any LLM application.
Created
2023-12-14
2,030 commits to main branch, last one 6 days ago
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection
Created
2020-07-08
66 commits to master branch, last one 4 months ago
Expressive is a cross-platform expression parsing and evaluation framework. The cross-platform nature is achieved through compiling for .NET Standard so it will run on practically any platform.
Created
2016-06-13
291 commits to main branch, last one 4 months ago
Test and evaluate LLMs and model configurations, across all the scenarios that matter for your application
Created
2024-03-14
218 commits to main branch, last one 8 months ago
DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
Created
2024-03-28
766 commits to main branch, last one 3 days ago
Evaluation suite for large-scale language models.
Created
2021-08-05
7 commits to main branch, last one 3 years ago
Multilingual Large Language Models Evaluation Benchmark
Created
2023-08-07
18 commits to main branch, last one about a year ago
Optical Flow Dataset and Benchmark for Visual Crowd Analysis
Created
2018-09-10
40 commits to master branch, last one about a year ago
A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
Created
2024-07-18
58 commits to main branch, last one 20 hours ago
LiDAR SLAM comparison and evaluation framework
Created
2021-07-26
17 commits to main branch, last one 3 years ago
Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中
Created
2023-09-28
27 commits to master branch, last one about a year ago
Evaluation framework for oncology foundation models (FMs)
Created
2024-01-16
324 commits to main branch, last one 2 days ago
This repository allows you to evaluate a trained computer vision model and get general information and evaluation metrics with little configuration.
Created
2022-10-28
10 commits to main branch, last one about a year ago
A toolkit for auto-generation of OpenAI Gym environments from RDDL description files.
Created
2022-07-10
2,103 commits to main branch, last one 15 days ago
The implementation of the paper "Evaluating Coherence in Dialogue Systems using Entailment"
Created
2019-04-13
10 commits to master branch, last one 4 months ago
Vectory provides a collection of tools to track and compare embedding versions.
Created
2022-09-30
45 commits to main branch, last one 2 years ago
The official implementation of ECCV'24 paper "To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now". This work introduces one fast and effe...
Created
2023-10-17
68 commits to main branch, last one 2 months ago