Trending repositories for topic dataset
Label Studio is a multi-type data labeling and annotation tool with standardized output format
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
A taxonomy of industrial anomaly detection methods and datasets (updating).
A quick guide (especially) for trending instruction finetuning datasets
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.
Documentation on how to access and use the Quick, Draw! Dataset.
A curated list of radar datasets, detection, tracking and fusion
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
The Electricity Transformer dataset is collected to support the further investigation on the long sequence forecasting problem.
A taxonomy of industrial anomaly detection methods and datasets (updating).
Transformer-based fNIRS Classification. Paper: Transformer Model for Functional Near-Infrared Spectroscopy Classification
The world's first roller coaster SLAM dataset
📍TextSLAM: Visual SLAM with Semantic Planar Text Features. (ICRA2020 & TPAMI2023)
This is a reposotory that includes paper、code and datasets about domain generalization-based fault diagnosis and prognosis. (基于领域泛化的故障诊断和预测,持续更新)
A curated list of peer-reviewed papers on theoretical and practical aspects of drivers' attention used for paper "Attention for Vision-Based Assistive and Automated Driving: A Review of Algorithms and...
(IEEE TITS 2024) WHU-Railway3D: A Diverse Dataset and Benchmark for Railway Point Cloud Semantic Segmentation
Collection of 383 car logos images with few variations of sizes and JSON file for better usability.
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
Label Studio is a multi-type data labeling and annotation tool with standardized output format
A taxonomy of industrial anomaly detection methods and datasets (updating).
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
A quick guide (especially) for trending instruction finetuning datasets
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.
📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.
Documentation on how to access and use the Quick, Draw! Dataset.
A curated list of radar datasets, detection, tracking and fusion
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
A taxonomy of industrial anomaly detection methods and datasets (updating).
We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.
Transformer-based fNIRS Classification. Paper: Transformer Model for Functional Near-Infrared Spectroscopy Classification
The world's first roller coaster SLAM dataset
📍TextSLAM: Visual SLAM with Semantic Planar Text Features. (ICRA2020 & TPAMI2023)
This is a reposotory that includes paper、code and datasets about domain generalization-based fault diagnosis and prognosis. (基于领域泛化的故障诊断和预测,持续更新)
A curated list of peer-reviewed papers on theoretical and practical aspects of drivers' attention used for paper "Attention for Vision-Based Assistive and Automated Driving: A Review of Algorithms and...
(IEEE TITS 2024) WHU-Railway3D: A Diverse Dataset and Benchmark for Railway Point Cloud Semantic Segmentation
Collection of 383 car logos images with few variations of sizes and JSON file for better usability.
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving
Label Studio is a multi-type data labeling and annotation tool with standardized output format
pix2tex: Using a ViT to convert images of equations into LaTeX code.
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Techniques for deep learning with satellite & aerial imagery
Transformer: PyTorch Implementation of "Attention Is All You Need"
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
A quick guide (especially) for trending instruction finetuning datasets
Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc 🦖
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.
A MNIST-like fashion product database. Benchmark :point_down:
A list of tools, papers and code related to Deepfake Detection.
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
[ECCV 2024 Oral] PetFace: A Large-Scale Dataset and Benchmark for Animal Identification https://arxiv.org/abs/2407.13555
This reposotory release a bearing failure dataset, which can support intelliegnt fault diagnosis research(实验室自采轴承开源数据集,包含稳定转速和时变转速)
Code repository for the ECCV paper "MSD: A Benchmark Dataset for Floor Plan of Building Complexes".
[ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling
Pre-rendered regularization images of men and women, mainly faces, seeking to generate more realistic images (without wax skin)
[ECCV2024] "Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal", https://arxiv.org/abs/2407.16957
Time-Series Anomaly Detection Comprehensive Benchmark
MegaVul - The largest, high-quality, extensible, continuously updated, C/C++/Java vulnerability dataset
An open-source mechanical failure dataset is available, comprising 30+ categories including bearings, gears, pumps, and others.(30余个开源故障诊断和预测数据集,不断更新中)
The World's Largest Decentralized AGI Multimodal Dataset
a curated list of speech datasets (110+ datasets, 75+ easy to download)
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
Multiple datasets for ARC (Abstraction and Reasoning Corpus)
:hugs: AeroPath: An airway segmentation benchmark dataset with challenging pathology
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity". The MMInstruct dataset includes 973K instructions from 24 domains...
Developed a sophisticated machine learning model capable of generating diverse interview questions aligned with specific topics, ensuring depth of conversation. Integrated advanced Natural Language Pr...
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
This repository contains a reading list of papers on Time Series Segmentation. This repository is still being continuously improved.
[ICLR 2024 Oral] Supervised Pre-Trained 3D Models for Medical Image Analysis (9,262 CT volumes + 25 annotated classes)
Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
The human toll of Israel's ongoing genocide in names & numbers. Use the data from our APIs to tell their story.
[NeurIPS 2024 D&B Spotlight🔥] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
A Collection of 10.000 collected Windows Chrome Fingerprints. Usable with an easy-to-use API, available as a compressed (lzma) or full-size Json (view Releases). Its just 1.4mb in size in compressed f...
A comprehesive survey about foundation models for weather and cliamte data understanding.
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
A quick guide (especially) for trending instruction finetuning datasets
Techniques for deep learning with satellite & aerial imagery
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Transformer: PyTorch Implementation of "Attention Is All You Need"
esProc SPL is a scripting language for data processing, with well-designed rich library functions and powerful syntax, which can be executed in a Java program through JDBC interface and computing inde...
Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc 🦖
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.
A MNIST-like fashion product database. Benchmark :point_down:
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
[ICLR 2024 Oral] Supervised Pre-Trained 3D Models for Medical Image Analysis (9,262 CT volumes + 25 annotated classes)
Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
Code and Data artifact for NeurIPS 2023 paper - "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context". `multispy` is a lsp client library in Python intended to be used to bu...
This is a reposotory that includes paper、code and datasets about domain generalization-based fault diagnosis and prognosis. (基于领域泛化的故障诊断和预测,持续更新)
The world's first roller coaster SLAM dataset
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
[NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning
[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
MegaVul - The largest, high-quality, extensible, continuously updated, C/C++/Java vulnerability dataset
Code repository for the ECCV paper "MSD: A Benchmark Dataset for Floor Plan of Building Complexes".
xFinder: Robust and Pinpoint Answer Extraction for Large Language Models
A fully-annotated, open-design dataset of autonomous and piloted high-speed flight
🤖 Dataset for TextSLAM: Visual SLAM with Semantic Planar Text Features. (ICRA2020 & TPAMI2023)