Trending repositories for topic data-pipelines
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, D...
An orchestration platform for the development, production, and observation of data assets.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Visual Data Preparation and Transformation. Low-Code Python-based ETL.
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, D...
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
An orchestration platform for the development, production, and observation of data assets.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Visual Data Preparation and Transformation. Low-Code Python-based ETL.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, D...
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...
An orchestration platform for the development, production, and observation of data assets.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Visual Data Preparation and Transformation. Low-Code Python-based ETL.
The best place to learn data engineering. Built and maintained by the data engineering community.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, D...
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
An orchestration platform for the development, production, and observation of data assets.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Visual Data Preparation and Transformation. Low-Code Python-based ETL.
🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
The best place to learn data engineering. Built and maintained by the data engineering community.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, D...
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...
🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.
An orchestration platform for the development, production, and observation of data assets.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
The best place to learn data engineering. Built and maintained by the data engineering community.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Visual Data Preparation and Transformation. Low-Code Python-based ETL.
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, D...
Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services
A Pub/Sub for Tables based data integration platform, to discover, publish, modify and consume data effortlessly.
🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)
Smart Automation Tool for building modern Data Lakes and Data Pipelines
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service wit...
An orchestration platform for the development, production, and observation of data assets.
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, D...
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code
A Pub/Sub for Tables based data integration platform, to discover, publish, modify and consume data effortlessly.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, D...
An orchestration platform for the development, production, and observation of data assets.
🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Visual Data Preparation and Transformation. Low-Code Python-based ETL.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
The best place to learn data engineering. Built and maintained by the data engineering community.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Visual Data Preparation and Transformation. Low-Code Python-based ETL.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.
A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services
A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)
A high-performance, extremely flexible, and easily extensible universal workflow engine.
🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...
Main repo including core data model, data marts, data quality tests, and terminology sets.
The best place to learn data engineering. Built and maintained by the data engineering community.