Trending repositories for topic dataset
Label Studio is a multi-type data labeling and annotation tool with standardized output format
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
A list of tools, papers and code related to Deepfake Detection.
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Transformer: PyTorch Implementation of "Attention Is All You Need"
A MNIST-like fashion product database. Benchmark :point_down:
Documentation on how to access and use the Quick, Draw! Dataset.
A quick guide (especially) for trending instruction finetuning datasets
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data oper...
中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽取/模型/论文/etc
xFinder: Robust and Pinpoint Answer Extraction for Large Language Models
[SIGIR 2022] Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion
A list of tools, papers and code related to Deepfake Detection.
CircuitNet: An Open-Source Dataset for Machine Learning Applications in Electronic Design Automation (EDA)
The human toll of Israel's ongoing genocide in names & numbers. Use the data from our APIs to tell their story.
🔡 List of Tools, Libraries, Models, Datasets and other resources for Turkish NLP.
[ECCV 2022] Map-free Visual Relocalization: Metric Pose Relative to a Single Image
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
This is a reposotory that includes paper、code and datasets about domain generalization-based fault diagnosis and prognosis. (基于领域泛化的故障诊断和预测,持续更新)
DialogSum: A Real-life Scenario Dialogue Summarization Dataset - Findings of ACL 2021
Automated Resume Screening System using Machine Learning (With Dataset)
[ICLR 2024] DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome
Label Studio is a multi-type data labeling and annotation tool with standardized output format
pix2tex: Using a ViT to convert images of equations into LaTeX code.
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Techniques for deep learning with satellite & aerial imagery
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
Transformer: PyTorch Implementation of "Attention Is All You Need"
A list of tools, papers and code related to Deepfake Detection.
A MNIST-like fashion product database. Benchmark :point_down:
A quick guide (especially) for trending instruction finetuning datasets
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
Object Detection Dataset Tools. support dota、coco、yolo、pascal voc dataset convert and dota gap split
🤖 Dataset for TextSLAM: Visual SLAM with Semantic Planar Text Features. (ICRA2020 & TPAMI2023)
SEA is an automated paper review framework capable of generating comprehensive and high-quality review feedback with high consistency for papers, thereby assisting researchers in improving the quality...
xFinder: Robust and Pinpoint Answer Extraction for Large Language Models
Developed a sophisticated machine learning model capable of generating diverse interview questions aligned with specific topics, ensuring depth of conversation. Integrated advanced Natural Language Pr...
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
An open-source mechanical failure dataset is available, comprising 30+ categories including bearings, gears, pumps, and others.(30余个开源故障诊断和预测数据集,不断更新中)
The human toll of Israel's ongoing genocide in names & numbers. Use the data from our APIs to tell their story.
Deep Learning Based Steel Pipe Weld Defect Detection
SB Curated is a curated dataset of Solidity smart contracts annotated with tagged vulnerabilities. The dataset was created to evaluate the accuracy of automated analysis tools.
CircuitNet: An Open-Source Dataset for Machine Learning Applications in Electronic Design Automation (EDA)
A list of tools, papers and code related to Deepfake Detection.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Transformer: PyTorch Implementation of "Attention Is All You Need"
Techniques for deep learning with satellite & aerial imagery
A quick guide (especially) for trending instruction finetuning datasets
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
A list of tools, papers and code related to Deepfake Detection.
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc 🦖
📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.
xFinder: Robust and Pinpoint Answer Extraction for Large Language Models
Android malware source code dataset collected from public resources.
xFinder: Robust and Pinpoint Answer Extraction for Large Language Models
Data research, preparation, and manipulation nodes for model trainers and artists.
Multiple datasets for ARC (Abstraction and Reasoning Corpus)
Object Detection Dataset Tools. support dota、coco、yolo、pascal voc dataset convert and dota gap split
Versatile computational pipeline for processing protein structure data for deep learning applications.
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
A curated collection of public industrial datasets.
[CVPR2024] Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
An open-source mechanical failure dataset is available, comprising 30+ categories including bearings, gears, pumps, and others.(30余个开源故障诊断和预测数据集,不断更新中)
This reposotory release a bearing failure dataset, which can support intelliegnt fault diagnosis research(实验室自采轴承开源数据集,包含稳定转速和时变转速)
SEA is an automated paper review framework capable of generating comprehensive and high-quality review feedback with high consistency for papers, thereby assisting researchers in improving the quality...
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
A complete list of IATA Airports including IATA code, ICAO code, Time zone, name, city code, two-letter ISO country code, URL, elevation above sea level in feet, coordinates in decimal degrees, geo en...
WildlifeDatasets: An open-source toolkit for animal re-identification
A comprehesive survey about foundation models for weather and cliamte data understanding.
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
This repository contains a reading list of papers on Time Series Segmentation. This repository is still being continuously improved.
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
[ICLR 2024 Oral] Supervised Pre-Trained 3D Models for Medical Image Analysis (9,262 CT volumes + 25 annotated classes)
The human toll of Israel's ongoing genocide in names & numbers. Use the data from our APIs to tell their story.
Code and Data artifact for NeurIPS 2023 paper - "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context". `multispy` is a lsp client library in Python intended to be used to bu...
[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA
A Collection of 10.000 collected Windows Chrome Fingerprints. Usable with an easy-to-use API, available as a compressed (lzma) or full-size Json (view Releases). Its just 1.4mb in size in compressed f...
A comprehesive survey about foundation models for weather and cliamte data understanding.
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
esProc SPL is a scripting language for data processing, with well-designed rich library functions and powerful syntax, which can be executed in a Java program through JDBC interface and computing inde...
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
A quick guide (especially) for trending instruction finetuning datasets
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Techniques for deep learning with satellite & aerial imagery
Transformer: PyTorch Implementation of "Attention Is All You Need"
Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc 🦖
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
A MNIST-like fashion product database. Benchmark :point_down:
📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
Generate textbook-quality synthetic LLM pretraining data
[ICLR 2024 Oral] Supervised Pre-Trained 3D Models for Medical Image Analysis (9,262 CT volumes + 25 annotated classes)
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Code and Data artifact for NeurIPS 2023 paper - "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context". `multispy` is a lsp client library in Python intended to be used to bu...
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
This is a reposotory that includes paper、code and datasets about domain generalization-based fault diagnosis and prognosis. (基于领域泛化的故障诊断和预测,持续更新)
The world's first roller coaster SLAM dataset
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
[NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
[NeurIPS 2023] AbdomenAtlas 1.0 (5,195 CT volumes + 9 annotated classes)
Synthetic Role-Play Conversation Dataset Generation
xFinder: Robust and Pinpoint Answer Extraction for Large Language Models
A tool for efficient semi-supervised video object segmentation (great results with minimal manual labor) and a dataset for benchmarking
Dataset Helper program to automatically select, re scale and tag Datasets (composed of image and text) for Machine Learning training.
🤖 Dataset for TextSLAM: Visual SLAM with Semantic Planar Text Features. (ICRA2020 & TPAMI2023)