24 results found Sort:

1.7k
12.8k
mit
149
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Created 2016-01-09
1,571 commits to develop branch, last one 3 days ago
801
10.3k
agpl-3.0
86
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Created 2018-05-11
1,770 commits to master branch, last one 13 days ago
1.2k
6.3k
apache-2.0
47
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team colla...
Created 2021-08-01
11,998 commits to main branch, last one a day ago
Visualize and compare datasets, target values and associations, with one line of code.
Created 2020-05-09
135 commits to master branch, last one about a year ago
226
2.0k
apache-2.0
14
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Created 2020-12-14
786 commits to main branch, last one 7 days ago
234
1.5k
apache-2.0
36
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Created 2017-07-13
6,411 commits to develop branch, last one about a year ago
121
1.3k
apache-2.0
18
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Created 2021-07-07
827 commits to main branch, last one about a month ago
72
1.1k
agpl-3.0
16
Automatically find issues in image datasets and practice data-centric computer vision.
Created 2022-05-26
335 commits to main branch, last one about a year ago
178
597
apache-2.0
13
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
Created 2022-04-02
371 commits to dev branch, last one about a month ago
44
516
apache-2.0
12
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Created 2016-03-25
9,967 commits to master branch, last one 2 months ago
210
454
agpl-3.0
35
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
Created 2016-05-15
511 commits to master branch, last one 22 days ago
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algor...
Created 2020-04-09
1,654 commits to main branch, last one 9 days ago
34
237
other
7
Databricks framework to validate Data Quality of pySpark DataFrames
Created 2024-04-23
98 commits to main branch, last one 3 days ago
35
141
apache-2.0
11
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Created 2018-08-16
2,174 commits to develop branch, last one 2 years ago
24
133
apache-2.0
9
Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML f...
Created 2022-03-08
11,575 commits to develop branch, last one 2 months ago
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility acr...
Created 2024-04-18
117 commits to main branch, last one 3 days ago
11
82
other
2
Swiple enables you to easily observe, understand, validate and improve the quality of your data
Created 2022-04-15
80 commits to main branch, last one about a year ago
a set of scripts to pull meta data and data profiling metrics from relational database systems
Created 2017-02-08
60 commits to master branch, last one 11 months ago
Papers about training data quality management for ML models.
Created 2024-03-05
58 commits to main branch, last one about a month ago
5
44
apache-2.0
3
Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules
Created 2022-02-06
63 commits to main branch, last one 8 months ago
Open-source metadata collector based on ODD Specification
This repository has been archived (exclude archived)
Created 2022-02-10
259 commits to main branch, last one about a year ago