Statistics for topic datasets
RepositoryStats tracks 518,986 Github repositories, of these 326 are tagged with the datasets topic. The most common primary language for repositories using this topic is Python (136). Other languages include: Jupyter Notebook (36)
Stargazers over time for topic datasets
Most starred repositories for topic datasets (view more)
Trending repositories for topic datasets (view more)
Label Studio is a multi-type data labeling and annotation tool with standardized output format
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
Your all-in-one platform to build and use AI apps effortlessly on your own computer.
A list of publicly available datasets with real-time data maintained by the team at bytewax.io
Label Studio is a multi-type data labeling and annotation tool with standardized output format
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Your all-in-one platform to build and use AI apps effortlessly on your own computer.
📊 Adana - 1-click analytical dashboard for OSINT researchers
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee...
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
🤖 Build AI applications with confidence ✅ Understand how your users are using your LLM-app ✅ Get a full picture of the quality performance of your LLM-app ✅ Collaborate with your stakeholders in ONE ...
A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support ...
A curated list of language modeling researches for code and related datasets.
CSGHub is an opensource large model assets platform just like on-premise huggingface which helps to manage datasets, model files, codes and more. CSGHub是一个开源、可信的大模型资产管理平台,可帮助用户治理LLM和LLM应用生命周期中涉及到的资产(数...
A list of publicly available datasets with real-time data maintained by the team at bytewax.io
Label Studio is a multi-type data labeling and annotation tool with standardized output format
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support ...
A list of publicly available datasets with real-time data maintained by the team at bytewax.io
Croissant is a high-level format for machine learning datasets that brings together four rich layers.
Tools for easing the handoff between AI/ML and App/SRE teams.
[ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models