Statistics for topic datasets
RepositoryStats tracks 643,414 Github repositories, of these 378 are tagged with the datasets topic. The most common primary language for repositories using this topic is Python (162). Other languages include: Jupyter Notebook (41)
Stargazers over time for topic datasets
Most starred repositories for topic datasets (view more)
Trending repositories for topic datasets (view more)
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Label Studio is a multi-type data labeling and annotation tool with standardized output format
The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨
🎨 IMAGGarment-1: Fine-Grained Garment Generation with Controllable Structure, Color, and Logo. It supports precise and customizable garment synthesis guided by multi-conditions (e.g., sketch, color,...
Synthesizing High-quality Text-to-SQL Data at Scale. SynSQL-2.5M is the first million-scale cross-domain text-to-SQL dataset.
The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨
🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applic...
Label Studio is a multi-type data labeling and annotation tool with standardized output format
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)
🎨 IMAGGarment-1: Fine-Grained Garment Generation with Controllable Structure, Color, and Logo. It supports precise and customizable garment synthesis guided by multi-conditions (e.g., sketch, color,...
Synthesizing High-quality Text-to-SQL Data at Scale. SynSQL-2.5M is the first million-scale cross-domain text-to-SQL dataset.
Multiple datasets for ARC (Abstraction and Reasoning Corpus)
Label Studio is a multi-type data labeling and annotation tool with standardized output format
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)
Synthesizing High-quality Text-to-SQL Data at Scale. SynSQL-2.5M is the first million-scale cross-domain text-to-SQL dataset.
A list of public EMG datasets and their papers, with a focus on raw EMG signals.
This repository is a collection of existing KGQA datasets in the form of the 🤗 huggingface datasets -> https://github.com/huggingface/datasets library, aiming to provide easy-to-use access to them.
Multiple datasets for ARC (Abstraction and Reasoning Corpus)
[AAAI 2025]👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing. It enables customizable human image generation with flexible garment, pose, and scene control, ensuring high f...
Synthesizing High-quality Text-to-SQL Data at Scale. SynSQL-2.5M is the first million-scale cross-domain text-to-SQL dataset.
Official implementation of "Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM"
Label Studio is a multi-type data labeling and annotation tool with standardized output format
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)
The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨
csghub-server is the backend server for CSGHub which helps user to manage datasets, modes, and also run Model Inference, Finetune and Application Spaces.
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
A repository of datasets paired with rich documentation, data essays, and teaching resources