Statistics for topic dataset
RepositoryStats tracks 579,129 Github repositories, of these 1,130 are tagged with the dataset topic. The most common primary language for repositories using this topic is Python (597). Other languages include: Jupyter Notebook (151), C++ (25), JavaScript (23), HTML (19), MATLAB (15), R (15)
Stargazers over time for topic dataset
Most starred repositories for topic dataset (view more)
Trending repositories for topic dataset (view more)
Label Studio is a multi-type data labeling and annotation tool with standardized output format
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
A taxonomy of industrial anomaly detection methods and datasets (updating).
A taxonomy of industrial anomaly detection methods and datasets (updating).
Transformer-based fNIRS Classification. Paper: Transformer Model for Functional Near-Infrared Spectroscopy Classification
Label Studio is a multi-type data labeling and annotation tool with standardized output format
A taxonomy of industrial anomaly detection methods and datasets (updating).
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
A taxonomy of industrial anomaly detection methods and datasets (updating).
We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.
Transformer-based fNIRS Classification. Paper: Transformer Model for Functional Near-Infrared Spectroscopy Classification
Label Studio is a multi-type data labeling and annotation tool with standardized output format
pix2tex: Using a ViT to convert images of equations into LaTeX code.
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
[ECCV 2024 Oral] PetFace: A Large-Scale Dataset and Benchmark for Animal Identification https://arxiv.org/abs/2407.13555
This reposotory release a bearing failure dataset, which can support intelliegnt fault diagnosis research(实验室自采轴承开源数据集,包含稳定转速和时变转速)
Code repository for the ECCV paper "MSD: A Benchmark Dataset for Floor Plan of Building Complexes".
[ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
This repository contains a reading list of papers on Time Series Segmentation. This repository is still being continuously improved.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
[ICLR 2024 Oral] Supervised Pre-Trained 3D Models for Medical Image Analysis (9,262 CT volumes + 25 annotated classes)
Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.