14 results found Sort:
- Filter by Primary Language:
- Python (10)
- Jupyter Notebook (1)
- TypeScript (1)
- +
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Created
2018-05-11
1,749 commits to master branch, last one 8 days ago
Refine high-quality datasets and visual AI models
Created
2020-04-22
22,327 commits to develop branch, last one 4 hours ago
A Doctor for your data
Created
2023-05-02
32 commits to master branch, last one 12 months ago
fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data...
image
python
dataset
data-curation
deep-learning
visual-search
visualization
image-analysis
image-processing
image-similarity
machine-learning
object-detection
data-augmentation
novelty-detection
outlier-detection
image-classfication
visualization-tools
image-classification
image-duplicate-detection
Created
2022-05-11
1,342 commits to main branch, last one 2 days ago
Interactively explore unstructured datasets from your dataframe.
Created
2023-01-29
1,527 commits to main branch, last one about a month ago
A curated, but incomplete, list of data-centric AI resources.
Created
2023-03-07
69 commits to main branch, last one 6 months ago
Curated list of open source tooling for data-centric AI on unstructured data.
nlp
data-drift
awesome-list
noisy-labels
data-curation
deep-learning
bias-detection
explainable-ai
feature-vector
synthetic-data
active-learning
computer-vision
data-centric-ai
data-versioning
machine-learning
outlier-detection
data-visualization
documentation-only
uncertainty-estimation
robust-machine-learning
Created
2023-02-27
34 commits to main branch, last one about a year ago
Scalable data pre processing and curation toolkit for LLMs
Created
2024-03-14
242 commits to main branch, last one 3 days ago
Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.
Created
2020-06-10
201 commits to master branch, last one 2 years ago
A library for detecting problematic data segments in structured and unstructured data with few lines of code.
Created
2023-06-14
404 commits to main branch, last one about a year ago
Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning
Created
2019-03-29
96 commits to master branch, last one 3 years ago
A tool for downloading from public image boards (which allow scraping) / preview your images & tags / edit your images & tags. Additional tabs for downloading other desired code repositories as well a...
Created
2023-05-08
246 commits to main branch, last one 13 days ago
Code and data for "Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation" (EMNLP 2023)
Created
2023-07-03
15 commits to main branch, last one 8 months ago
🧼🔎 A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors (NeurIPS'24).
Created
2024-02-14
98 commits to main branch, last one 2 months ago