13 results found Sort:
- Filter by Primary Language:
- Python (9)
- Jupyter Notebook (1)
- TypeScript (1)
- +
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Created
2018-05-11
1,743 commits to master branch, last one 14 days ago
Refine high-quality datasets and visual AI models
Created
2020-04-22
21,609 commits to develop branch, last one 10 hours ago
A Doctor for your data
Created
2023-05-02
32 commits to master branch, last one 10 months ago
fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data...
image
python
dataset
data-curation
deep-learning
visual-search
visualization
image-analysis
image-processing
image-similarity
machine-learning
object-detection
data-augmentation
novelty-detection
outlier-detection
image-classfication
visualization-tools
image-classification
image-duplicate-detection
Created
2022-05-11
1,322 commits to main branch, last one 2 months ago
Interactively explore unstructured datasets from your dataframe.
Created
2023-01-29
1,481 commits to main branch, last one 3 months ago
A curated, but incomplete, list of data-centric AI resources.
Created
2023-03-07
69 commits to main branch, last one 4 months ago
Curated list of open source tooling for data-centric AI on unstructured data.
nlp
data-drift
awesome-list
noisy-labels
data-curation
deep-learning
bias-detection
explainable-ai
feature-vector
synthetic-data
active-learning
computer-vision
data-centric-ai
data-versioning
machine-learning
outlier-detection
data-visualization
documentation-only
uncertainty-estimation
robust-machine-learning
Created
2023-02-27
34 commits to main branch, last one 11 months ago
Scalable data pre processing and curation toolkit for LLMs
Created
2024-03-14
182 commits to main branch, last one 10 hours ago
Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.
Created
2020-06-10
201 commits to master branch, last one 2 years ago
A library for detecting problematic data segments in structured and unstructured data with few lines of code.
Created
2023-06-14
404 commits to main branch, last one 11 months ago
Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning
Created
2019-03-29
96 commits to master branch, last one 3 years ago
A tool for downloading from public image boards (which allow scraping) / preview your images & tags / edit your images & tags. Additional tabs for downloading other desired code repositories as well a...
Created
2023-05-08
227 commits to main branch, last one 6 days ago
Code and data for "Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation" (EMNLP 2023)
Created
2023-07-03
15 commits to main branch, last one 6 months ago