Statistics for topic dataset
RepositoryStats tracks 530,574 Github repositories, of these 1,048 are tagged with the dataset topic. The most common primary language for repositories using this topic is Python (551). Other languages include: Jupyter Notebook (139), C++ (25), JavaScript (22), HTML (16), MATLAB (14), R (12)
Stargazers over time for topic dataset
Most starred repositories for topic dataset (view more)
Trending repositories for topic dataset (view more)
Label Studio is a multi-type data labeling and annotation tool with standardized output format
pix2tex: Using a ViT to convert images of equations into LaTeX code.
A quick guide (especially) for trending instruction finetuning datasets
The history files when recording human interaction while solving ARC tasks
Label Studio is a multi-type data labeling and annotation tool with standardized output format
The history files when recording human interaction while solving ARC tasks
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
The history files when recording human interaction while solving ARC tasks
Air Pollution Image Dataset from India and Nepal
Label Studio is a multi-type data labeling and annotation tool with standardized output format
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
The history files when recording human interaction while solving ARC tasks
xFinder: Robust and Pinpoint Answer Extraction for Large Language Models
A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
esProc SPL is a scripting language for data processing, with well-designed rich library functions and powerful syntax, which can be executed in a Java program through JDBC interface and computing inde...
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, et...
Generate textbook-quality synthetic LLM pretraining data
Dataset Helper program to automatically select, re scale and tag Datasets (composed of image and text) for Machine Learning training.
[ICLR 2024] Supervised Pre-Trained 3D Models for Medical Image Analysis
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.