Statistics for topic dataset
RepositoryStats tracks 561,684 Github repositories, of these 1,095 are tagged with the dataset topic. The most common primary language for repositories using this topic is Python (571). Other languages include: Jupyter Notebook (148), C++ (25), JavaScript (23), HTML (19), MATLAB (15), R (15)
Stargazers over time for topic dataset
Most starred repositories for topic dataset (view more)
Trending repositories for topic dataset (view more)
Label Studio is a multi-type data labeling and annotation tool with standardized output format
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
xFinder: Robust and Pinpoint Answer Extraction for Large Language Models
[SIGIR 2022] Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion
🔡 List of Tools, Libraries, Models, Datasets and other resources for Turkish NLP.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
pix2tex: Using a ViT to convert images of equations into LaTeX code.
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Object Detection Dataset Tools. support dota、coco、yolo、pascal voc dataset convert and dota gap split
🤖 Dataset for TextSLAM: Visual SLAM with Semantic Planar Text Features. (ICRA2020 & TPAMI2023)
SEA is an automated paper review framework capable of generating comprehensive and high-quality review feedback with high consistency for papers, thereby assisting researchers in improving the quality...
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Android malware source code dataset collected from public resources.
xFinder: Robust and Pinpoint Answer Extraction for Large Language Models
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
This repository contains a reading list of papers on Time Series Segmentation. This repository is still being continuously improved.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
esProc SPL is a scripting language for data processing, with well-designed rich library functions and powerful syntax, which can be executed in a Java program through JDBC interface and computing inde...
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
Generate textbook-quality synthetic LLM pretraining data
[ICLR 2024 Oral] Supervised Pre-Trained 3D Models for Medical Image Analysis (9,262 CT volumes + 25 annotated classes)
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!