Statistics for topic datasets
RepositoryStats tracks 627,864 Github repositories, of these 373 are tagged with the datasets topic. The most common primary language for repositories using this topic is Python (160). Other languages include: Jupyter Notebook (40)
Stargazers over time for topic datasets
Most starred repositories for topic datasets (view more)
Trending repositories for topic datasets (view more)
Label Studio is a multi-type data labeling and annotation tool with standardized output format
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
Synthesizing High-quality Text-to-SQL Data at Scale. SynSQL-2.5M is the first million-scale cross-domain text-to-SQL dataset.
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
Major Europe leagues data (England, Spain, Italy, Germany and France)
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Label Studio is a multi-type data labeling and annotation tool with standardized output format
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Synthesizing High-quality Text-to-SQL Data at Scale. SynSQL-2.5M is the first million-scale cross-domain text-to-SQL dataset.
A list of public EMG datasets and their papers, with a focus on raw EMG signals.
[AAAI 2025 Oral🚁] Game4Loc: A UAV Geo-Localization Benchmark from Game Data
Synthesizing High-quality Text-to-SQL Data at Scale. SynSQL-2.5M is the first million-scale cross-domain text-to-SQL dataset.
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Label Studio is a multi-type data labeling and annotation tool with standardized output format
⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)
A social event detection task datasets repository for the SocialED python library
Healthcare and biomedical datasets, for AI/ML
[AAAI 2025]👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing. It enables customizable human image generation with flexible garment, pose, and scene control, ensuring high f...
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
A curated list of Place Recognition methods, datasets, and various algorithms for LiDAR
Label Studio is a multi-type data labeling and annotation tool with standardized output format
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨
An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.
csghub-server is the backend server for CSGHub which helps user to manage datasets, modes, and also run Model Inference, Finetune and Application Spaces.
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.