Trending repositories for topic data-quality
Learn how to design, develop, deploy and iterate on production-grade ML applications.
The open-source tool for building high-quality datasets and computer vision models
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
OpenMetadata is a unified platform for discovery, observability, and governance powered by a central metadata repository, in-depth lineage, and seamless team collaboration.
lakeFS - Data version control for your data lake | Git for data
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Always know what to expect from your data.
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility acr...
📙 Awesome Data Catalogs and Observability Platforms.
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility acr...
OpenMetadata is a unified platform for discovery, observability, and governance powered by a central metadata repository, in-depth lineage, and seamless team collaboration.
The open-source tool for building high-quality datasets and computer vision models
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
📙 Awesome Data Catalogs and Observability Platforms.
lakeFS - Data version control for your data lake | Git for data
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Always know what to expect from your data.
Learn how to design, develop, deploy and iterate on production-grade ML applications.
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
OpenMetadata is a unified platform for discovery, observability, and governance powered by a central metadata repository, in-depth lineage, and seamless team collaboration.
Learn how to design, develop, deploy and iterate on production-grade ML applications.
The open-source tool for building high-quality datasets and computer vision models
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Always know what to expect from your data.
lakeFS - Data version control for your data lake | Git for data
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility acr...
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collect...
Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML f...
📙 Awesome Data Catalogs and Observability Platforms.
Compilation of high-profile real-world examples of failed machine learning projects
Learn how to design, develop, deploy and iterate on production-grade ML applications.
A curated list of awesome open source tools and commercial products for monitoring data quality, monitoring model performance, and profiling data 🚀
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility acr...
Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML f...
A curated list of awesome open source tools and commercial products for monitoring data quality, monitoring model performance, and profiling data 🚀
A curated list of awesome resources such as books, tutorials, courses, open-source libraries, exercises, and other materials that support Pythonistas in the making, and Pythonistas migrating into Data...
A tool to help improve data quality standards in observational data science.
OpenMetadata is a unified platform for discovery, observability, and governance powered by a central metadata repository, in-depth lineage, and seamless team collaboration.
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
The open-source tool for building high-quality datasets and computer vision models
lakeFS - Data version control for your data lake | Git for data
📙 Awesome Data Catalogs and Observability Platforms.
Compilation of high-profile real-world examples of failed machine learning projects
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
A curated, but incomplete, list of data-centric AI resources.
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Always know what to expect from your data.
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collect...
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
OpenMetadata is a unified platform for discovery, observability, and governance powered by a central metadata repository, in-depth lineage, and seamless team collaboration.
The open-source tool for building high-quality datasets and computer vision models
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Always know what to expect from your data.
Compilation of high-profile real-world examples of failed machine learning projects
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
lakeFS - Data version control for your data lake | Git for data
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility acr...
Learn how to design, develop, deploy and iterate on production-grade ML applications.
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
📙 Awesome Data Catalogs and Observability Platforms.
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collect...
Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems cause...
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility acr...
Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML f...
Compilation of high-profile real-world examples of failed machine learning projects
A curated list of awesome open source tools and commercial products for monitoring data quality, monitoring model performance, and profiling data 🚀
Possibly the fastest DataFrame-agnostic quality check library in town.
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Free Open-source ML observability course for data scientists and ML engineers. Learn how to monitor and debug your ML models in production.
OpenMetadata is a unified platform for discovery, observability, and governance powered by a central metadata repository, in-depth lineage, and seamless team collaboration.
pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
A curated list of awesome resources such as books, tutorials, courses, open-source libraries, exercises, and other materials that support Pythonistas in the making, and Pythonistas migrating into Data...
📙 Awesome Data Catalogs and Observability Platforms.
A tool to help improve data quality standards in observational data science.
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count uniqu...
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Prod...
The open-source tool for building high-quality datasets and computer vision models
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems cause...
Free Open-source ML observability course for data scientists and ML engineers. Learn how to monitor and debug your ML models in production.
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility acr...
The open-source tool for building high-quality datasets and computer vision models
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Learn how to design, develop, deploy and iterate on production-grade ML applications.
OpenMetadata is a unified platform for discovery, observability, and governance powered by a central metadata repository, in-depth lineage, and seamless team collaboration.
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Always know what to expect from your data.
lakeFS - Data version control for your data lake | Git for data
Learn how to design, develop, deploy and iterate on production-grade ML applications.
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collect...
Automatically find issues in image datasets and practice data-centric computer vision.
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
📙 Awesome Data Catalogs and Observability Platforms.
A curated list of awesome resources such as books, tutorials, courses, open-source libraries, exercises, and other materials that support Pythonistas in the making, and Pythonistas migrating into Data...
Possibly the fastest DataFrame-agnostic quality check library in town.
pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
A curated list of awesome open source tools and commercial products for monitoring data quality, monitoring model performance, and profiling data 🚀
Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML f...
The open-source tool for building high-quality datasets and computer vision models
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
OpenMetadata is a unified platform for discovery, observability, and governance powered by a central metadata repository, in-depth lineage, and seamless team collaboration.
📙 Awesome Data Catalogs and Observability Platforms.
Define, govern, and model event data for warehouse-first product analytics.
Automatically find issues in image datasets and practice data-centric computer vision.
FeatHub - A stream-batch unified feature store for real-time machine learning
CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count uniqu...
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
A Data Centric NER annotation tool for your Named Entity Recognition projects