25 results found Sort:
- Filter by Primary Language:
- Python (18)
- Java (3)
- C++ (1)
- TypeScript (1)
- Vue (1)
- +
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Created
2016-01-09
1,575 commits to develop branch, last one 27 days ago
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Created
2018-05-11
1,772 commits to master branch, last one 12 days ago
Always know what to expect from your data.
Created
2017-09-11
13,107 commits to develop branch, last one 3 days ago
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team colla...
Created
2021-08-01
12,384 commits to main branch, last one 6 hours ago
Visualize and compare datasets, target values and associations, with one line of code.
Created
2020-05-09
135 commits to master branch, last one about a year ago
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Created
2020-12-14
788 commits to main branch, last one 19 days ago
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Created
2017-07-13
6,411 commits to develop branch, last one about a year ago
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Created
2021-07-07
827 commits to main branch, last one 2 months ago
Automatically find issues in image datasets and practice data-centric computer vision.
Created
2022-05-26
338 commits to main branch, last one 19 days ago
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
Created
2022-04-02
375 commits to dev branch, last one 3 days ago
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Created
2016-03-25
9,968 commits to master branch, last one 14 days ago
Monitor the stability of a Pandas or Spark dataframe ⚙︎
Created
2020-04-23
549 commits to master branch, last one 2 months ago
Code review for data in dbt
Created
2022-03-31
3,221 commits to main branch, last one 3 months ago
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
Created
2016-05-15
513 commits to master branch, last one 6 days ago
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algor...
data-mining
correlations
data-science
spreadsheets
tabular-data
data-cleaning
data-analytics
data-cleansing
data-profiling
data-wrangling
data-engineering
data-exploration
anomaly-detection
feature-selection
data-preprocessing
feature-extraction
feature-engineering
knowledge-discovery
data-mining-algorithms
exploratory-data-analysis
Created
2020-04-09
1,677 commits to main branch, last one 12 days ago
Databricks framework to validate Data Quality of pySpark DataFrames
Created
2024-04-23
102 commits to main branch, last one 13 days ago
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Created
2018-08-16
2,174 commits to develop branch, last one 2 years ago
Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML f...
Created
2022-03-08
11,579 commits to develop branch, last one 20 days ago
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility acr...
Created
2024-04-18
125 commits to main branch, last one 14 days ago
Swiple enables you to easily observe, understand, validate and improve the quality of your data
Created
2022-04-15
80 commits to main branch, last one about a year ago
a set of scripts to pull meta data and data profiling metrics from relational database systems
Created
2017-02-08
60 commits to master branch, last one about a year ago
Papers about training data quality management for ML models.
Created
2024-03-05
58 commits to main branch, last one 2 months ago
Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules
Created
2022-02-06
63 commits to main branch, last one 9 months ago
Open-source metadata collector based on ODD Specification
This repository has been archived
(exclude archived)
Created
2022-02-10
259 commits to main branch, last one about a year ago
Client interface to Cleanlab Studio and the Trustworthy Language Model
llm
automl
annotations
data-quality
data-science
noisy-labels
data-cleaning
data-curation
data-labeling
data-profiling
computer-vision
data-centric-ai
data-validation
structured-data
machine-learning
model-deployment
outlier-detection
text-classification
image-classification
natural-language-processing
Created
2022-03-03
851 commits to main branch, last one 2 months ago