Trending repositories for topic datasets
The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop....
csghub-server is the backend server for CSGHub which helps user to manage datasets, modes, and also run Model Inference, Finetune and Application Spaces.
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
A list of awesome papers and resources of recommender system on large language model (LLM).
Techniques for deep learning with satellite & aerial imagery
[ACL 2023] Reasoning with Language Model Prompting: A Survey
🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.
A large collection of system log datasets for AI-driven log analytics [ISSRE'23]
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨
csghub-server is the backend server for CSGHub which helps user to manage datasets, modes, and also run Model Inference, Finetune and Application Spaces.
Curated list of Ukrainian natural language processing (NLP) resources (corpora, pretrained models, libriaries, etc.)
A list of datasets, tools, papers and code related to Deepfakes.
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio app...
[ACL 2023] Reasoning with Language Model Prompting: A Survey
A standard format for offline reinforcement learning datasets, with popular reference datasets and related utilities
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.
chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
A list of awesome papers and resources of recommender system on large language model (LLM).
🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.
Papers and Datasets on Instruction Tuning and Following. ✨✨✨
The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨
Label Studio is a multi-type data labeling and annotation tool with standardized output format
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
Techniques for deep learning with satellite & aerial imagery
FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, and Radar Chart.
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop....
A large collection of system log datasets for AI-driven log analytics [ISSRE'23]
A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio app...
A list of awesome papers and resources of recommender system on large language model (LLM).
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨
A curated set of references to useful UK Government datasets
A Package Manager for Machine Learning Datasets and Models.
List of datasets and papers in X-ray security images (Computer vision/Machine Learning)
Major Europe leagues data (England, Spain, Italy, Germany and France)
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio app...
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
Curated list of Ukrainian natural language processing (NLP) resources (corpora, pretrained models, libriaries, etc.)
🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applica...
OSINT cheat sheet, list OSINT tools, wiki, dataset, article, book , red team OSINT and OSINT tips. This repository will grow over time, there is research, science and technology, use it wisely.
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
Grimoire is All You Need for Enhancing Large Language Models
Label Studio is a multi-type data labeling and annotation tool with standardized output format
The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Techniques for deep learning with satellite & aerial imagery
A list of awesome papers and resources of recommender system on large language model (LLM).
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop....
A Package Manager for Machine Learning Datasets and Models.
A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL
FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, and Radar Chart.
[AAAI 2025]👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing. It enables customizable human image generation with flexible garment, pose, and scene control, ensuring high f...
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
A curated set of references to useful UK Government datasets
The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨
[AAAI 2025 Oral🚁] Game4Loc: A UAV Geo-Localization Benchmark from Game Data
A benchmark fault diagnosis dataset comprises vibration data collected from a gearbox under variable working conditions with intentionally induced faults, encompassing diverse fault severities and typ...
Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee...
A list of datasets, tools, papers and code related to Deepfakes.
[NeurIPS 2024 🔥] TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs
Official implementation of "Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM"
List of datasets and papers in X-ray security images (Computer vision/Machine Learning)
🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applica...
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
[AAAI 2025]👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing. It enables customizable human image generation with flexible garment, pose, and scene control, ensuring high f...
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.
A curated list of Place Recognition methods, datasets, and various algorithms for LiDAR
Official implementation of "Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM"
A repository of datasets paired with rich documentation, data essays, and teaching resources
🎉🎨 Papers, Code, Datasets for Neuroscience and Cognition Science
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
[AAAI 2025 Oral🚁] Game4Loc: A UAV Geo-Localization Benchmark from Game Data
Label Studio is a multi-type data labeling and annotation tool with standardized output format
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
CSGHub is an open-source large model platform just like on-premise version of Hugging Face. You can easily manage models and datasets, deploy model applications and setup model finetune or inference j...
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Techniques for deep learning with satellite & aerial imagery
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
[AAAI 2025]👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing. It enables customizable human image generation with flexible garment, pose, and scene control, ensuring high f...
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
An open source multi-tool for exploring and publishing data
A list of awesome papers and resources of recommender system on large language model (LLM).
The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop....
A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL
FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, and Radar Chart.
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨
An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
csghub-server is the backend server for CSGHub which helps user to manage datasets, modes, and also run Model Inference, Finetune and Application Spaces.
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
CSGHub is an open-source large model platform just like on-premise version of Hugging Face. You can easily manage models and datasets, deploy model applications and setup model finetune or inference j...
A curated list of Place Recognition methods, datasets, and various algorithms for LiDAR
🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
"Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases" by Jiarui Li and Ye Yuan and Zehua Zhang
Croissant is a high-level format for machine learning datasets that brings together four rich layers.
Multiple datasets for ARC (Abstraction and Reasoning Corpus)
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
Resources about solar power systems for data science
WildlifeDatasets: An open-source toolkit for animal re-identification