Statistics for topic dataset
RepositoryStats tracks 650,729 Github repositories, of these 1,253 are tagged with the dataset topic. The most common primary language for repositories using this topic is Python (667). Other languages include: Jupyter Notebook (165), C++ (25), JavaScript (23), HTML (19), R (16), MATLAB (15), TypeScript (12), Shell (11)
Stargazers over time for topic dataset
Most starred repositories for topic dataset (view more)
Trending repositories for topic dataset (view more)
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
pix2tex: Using a ViT to convert images of equations into LaTeX code.
[NeurIPS'24 Spotlight] Text2CAD: Generating Sequential CAD Designs from Beginner-to-Expert Level Text Prompts
[IEEE JSTARS 2024] CV-Cities: Advancing Cross-view Geo-localization in Global Cities
This is a reposotory that includes paper、code and datasets about domain generalization-based fault diagnosis and prognosis. (基于领域泛化的故障诊断和预测)
MineStudio: A Streamlined Package for Minecraft AI Agent Development
Label Studio is a multi-type data labeling and annotation tool with standardized output format
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
[IEEE JSTARS 2024] CV-Cities: Advancing Cross-view Geo-localization in Global Cities
[NeurIPS'24 Spotlight] Text2CAD: Generating Sequential CAD Designs from Beginner-to-Expert Level Text Prompts
✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork
SEA is an automated paper review framework capable of generating comprehensive and high-quality review feedback with high consistency for papers, thereby assisting researchers in improving the quality...
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
pix2tex: Using a ViT to convert images of equations into LaTeX code.
[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis
Official repository for "NoLiMa: Long-Context Evaluation Beyond Literal Matching"
The world's first roller coaster SLAM dataset
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
This is a continuously updated handbook for readers to easily track the latest Text-to-SQL techniques in the literature and provide practical guidance for researchers and practitioners. If we missed a...
Label Studio is a multi-type data labeling and annotation tool with standardized output format
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
[NeurIPS'24 Spotlight] Text2CAD: Generating Sequential CAD Designs from Beginner-to-Expert Level Text Prompts
Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
The history files when recording human interaction while solving ARC tasks