Statistics for topic datasets
RepositoryStats tracks 579,129 Github repositories, of these 356 are tagged with the datasets topic. The most common primary language for repositories using this topic is Python (150). Other languages include: Jupyter Notebook (39)
Stargazers over time for topic datasets
Most starred repositories for topic datasets (view more)
Trending repositories for topic datasets (view more)
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Industrial datasets - datasets for evaluating industrial intrusion detection systems on IPAL.
csghub-server is the backend server for CSGHub which helps user to manage datasets, modes, and also run Model Inference, Finetune and Application Spaces.
chinese NLP corpus of chinese science fiction, chinese science fiction corpus: Archive of the Ark Plan of Ula Science Fiction Website 乌拉科幻小说网方舟计划存档,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
A curated list of peer-reviewed papers on theoretical and practical aspects of drivers' attention used for paper "Attention for Vision-Based Assistive and Automated Driving: A Review of Algorithms and...
OpenABC-D is a large-scale labeled dataset generated by synthesizing open source hardware IPs. This dataset can be used for various graph level prediction problems in chip design.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Industrial datasets - datasets for evaluating industrial intrusion detection systems on IPAL.
csghub-server is the backend server for CSGHub which helps user to manage datasets, modes, and also run Model Inference, Finetune and Application Spaces.
chinese NLP corpus of chinese science fiction, chinese science fiction corpus: Archive of the Ark Plan of Ula Science Fiction Website 乌拉科幻小说网方舟计划存档,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
A curated list of peer-reviewed papers on theoretical and practical aspects of drivers' attention used for paper "Attention for Vision-Based Assistive and Automated Driving: A Review of Algorithms and...
OpenABC-D is a large-scale labeled dataset generated by synthesizing open source hardware IPs. This dataset can be used for various graph level prediction problems in chip design.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Securely share and store AI/ML projects as OCI artifacts in your container registry.
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
A curated list of datasets, publically available for machine learning research in the area of manufacturing
[TMLR] A curated list of language modeling researches for code and related datasets.
CSGHub is an open-source large model platform just like on-premise version of Hugging Face. You can easily manage models and datasets, deploy model applications and setup model finetune or inference j...
👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
csghub-server is the backend server for CSGHub which helps user to manage datasets, modes, and also run Model Inference, Finetune and Application Spaces.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
CSGHub is an open-source large model platform just like on-premise version of Hugging Face. You can easily manage models and datasets, deploy model applications and setup model finetune or inference j...
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Securely share and store AI/ML projects as OCI artifacts in your container registry.
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
[TMLR] A curated list of language modeling researches for code and related datasets.
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.