15 results found Sort:

800
10.2k
agpl-3.0
86
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Created 2018-05-11
1,770 commits to master branch, last one 4 days ago
215
2.8k
other
128
A Doctor for your data
Created 2023-05-02
33 commits to master branch, last one 2 months ago
79
1.7k
other
23
fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data...
Created 2022-05-11
1,346 commits to main branch, last one 2 months ago
85
1.2k
mit
18
Interactively explore unstructured datasets from your dataframe.
Created 2023-01-29
1,527 commits to main branch, last one 3 months ago
A curated, but incomplete, list of data-centric AI resources.
Created 2023-03-07
69 commits to main branch, last one 8 months ago
6
79
bsd-2-clause
5
Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.
Created 2020-06-10
201 commits to master branch, last one 2 years ago
A library for detecting problematic data segments in structured and unstructured data with few lines of code.
Created 2023-06-14
404 commits to main branch, last one about a year ago
19
51
bsd-3-clause
2
Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning
Created 2019-03-29
96 commits to master branch, last one 4 years ago
5
48
apache-2.0
3
[ICLR 2025] Improving Data Efficiency via Curating LLM-Driven Rating Systems
Created 2025-02-11
110 commits to main branch, last one 2 days ago
A tool for downloading from public image boards (which allow scraping) / preview your images & tags / edit your images & tags. Additional tabs for downloading other desired code repositories as well a...
Created 2023-05-08
250 commits to main branch, last one about a month ago
1
30
apache-2.0
3
Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation (EMNLP 2023)
Created 2023-07-03
15 commits to main branch, last one 10 months ago
🧼🔎 A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors (NeurIPS'24).
Created 2024-02-14
111 commits to main branch, last one 3 days ago