Statistics for topic benchmark
RepositoryStats tracks 579,129 Github repositories, of these 751 are tagged with the benchmark topic. The most common primary language for repositories using this topic is Python (300). Other languages include: C++ (52), Jupyter Notebook (51), Go (48), JavaScript (27), C (23), Java (22), TypeScript (22), Shell (20), Rust (18)
Stargazers over time for topic benchmark
Most starred repositories for topic benchmark (view more)
Trending repositories for topic benchmark (view more)
Ohayou(おはよう), HTTP load generator, inspired by rakyll/hey with tui animation.
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
[NeurIPS 2024] Touchstone - Benchmarking AI on 5,172 o.o.d. CT volumes and 9 anatomical structures
VPS测试脚本 | VPS性能测试(VPS基本信息、IO性能、全球测速、ping、回程路由测试)、BBR加速脚本(一种加速TCP的拥堵算法技术)、三网测速脚本(三网测速、流媒体检测)、线路路由测试(Linux VPS回程路由一键测试脚本)
A simple PHP script that helps you compare raw performance across servers and php versions
The Dawn of Video Generation: Preliminary Explorations with SORA-like Models
This is a benckmark for domain generalization-based fault diagnosis (基于领域泛化的相关代码)
Ohayou(おはよう), HTTP load generator, inspired by rakyll/hey with tui animation.
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
[NeurIPS 2024] Touchstone - Benchmarking AI on 5,172 o.o.d. CT volumes and 9 anatomical structures
VPS测试脚本 | VPS性能测试(VPS基本信息、IO性能、全球测速、ping、回程路由测试)、BBR加速脚本(一种加速TCP的拥堵算法技术)、三网测速脚本(三网测速、流媒体检测)、线路路由测试(Linux VPS回程路由一键测试脚本)
A simple PHP script that helps you compare raw performance across servers and php versions
The Dawn of Video Generation: Preliminary Explorations with SORA-like Models
This is a benckmark for domain generalization-based fault diagnosis (基于领域泛化的相关代码)
[CVPR 2024 Extension] 160K volumes (42M slices) datasets, new segmentation datasets, 31M-1.2B pre-trained models, various pre-training recipes, 50+ downstream tasks implementation
This repo contains the code and data for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks"
Ohayou(おはよう), HTTP load generator, inspired by rakyll/hey with tui animation.
VPS融合怪服务器测评脚本(VPS Fusion Monster Server Test Script)(尽量做最全能测试服务器的脚本)
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
The Dawn of Video Generation: Preliminary Explorations with SORA-like Models
[CVPR 2024 Extension] 160K volumes (42M slices) datasets, new segmentation datasets, 31M-1.2B pre-trained models, various pre-training recipes, 50+ downstream tasks implementation
(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment
🔥 Aurora Series: A more efficient multimodal large language model series for video.
TSB-AD: Towards A Reliable Time-Series Anomaly Detection Benchmark
Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
Ohayou(おはよう), HTTP load generator, inspired by rakyll/hey with tui animation.
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
VPS融合怪服务器测评脚本(VPS Fusion Monster Server Test Script)(尽量做最全能测试服务器的脚本)
Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM
Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718
Official code repository of CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph