Trending repositories for topic etl
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Apache Doris is an easy-to-use, high performance and unified analytics database.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Ylem is an open-source platform for real-time data streaming orchestration
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming ...
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
Efficient data transformation and modeling framework that is backwards compatible with dbt.
The best place to learn data engineering. Built and maintained by the data engineering community.
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
Ylem is an open-source platform for real-time data streaming orchestration
Context-aware structured outputs. Search your documents or the web for specific data and get it back in JSON or Markdown.
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
The best place to learn data engineering. Built and maintained by the data engineering community.
Efficient data transformation and modeling framework that is backwards compatible with dbt.
A curated list with resources about node-based UIs
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
Apache Doris is an easy-to-use, high performance and unified analytics database.
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Apache Doris is an easy-to-use, high performance and unified analytics database.
An orchestration platform for the development, production, and observation of data assets.
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
A curated list with resources about node-based UIs
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Ylem is an open-source platform for real-time data streaming orchestration
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming ...
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
The best place to learn data engineering. Built and maintained by the data engineering community.
Ylem is an open-source platform for real-time data streaming orchestration
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Context-aware structured outputs. Search your documents or the web for specific data and get it back in JSON or Markdown.
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
DataForge helps data teams write functional transformation pipelines by leveraging software engineering principles
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
🌟 Examples of use cases that utilize Decodable, as well as demos for related open-source projects such as Apache Flink, Debezium, and Postgres.
go-etl is a toolset for data extraction, transformation and loading。(go-etl是一个集数据源抽取,转化,加载的工具集,提供强大的数据同步能力)
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Construct a modern data stack and orchestration the workflows to create high quality data for analytics and ML applications.
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a...
A curated list with resources about node-based UIs
大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Doris is an easy-to-use, high performance and unified analytics database.
An orchestration platform for the development, production, and observation of data assets.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
A curated list with resources about node-based UIs
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming ...
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
The best place to learn data engineering. Built and maintained by the data engineering community.
Context-aware structured outputs. Search your documents or the web for specific data and get it back in JSON or Markdown.
Ylem is an open-source platform for real-time data streaming orchestration
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Context-aware structured outputs. Search your documents or the web for specific data and get it back in JSON or Markdown.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
基于python和llm大模型开发的数据处理和任务调度系统。 支持数据源管理,数据模型管理,数据集成,数据查询API接口,低代码自定义数据处理任务模版,单任务及dag任务工作流调度等功能。集成了llm模块实现rag知识库问答,链接各数据源数据进行数据对话问答,交互式数据分析功能。
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence
DataForge helps data teams write functional transformation pipelines by leveraging software engineering principles
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.
end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Context-aware structured outputs. Search your documents or the web for specific data and get it back in JSON or Markdown.
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
Ylem is an open-source platform for real-time data streaming orchestration
DataForge helps data teams write functional transformation pipelines by leveraging software engineering principles
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
An orchestration platform for the development, production, and observation of data assets.
Apache Doris is an easy-to-use, high performance and unified analytics database.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming ...
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
Efficient data transformation and modeling framework that is backwards compatible with dbt.
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
A curated list with resources about node-based UIs
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
Context-aware structured outputs. Search your documents or the web for specific data and get it back in JSON or Markdown.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
基于python和llm大模型开发的数据处理和任务调度系统。 支持数据源管理,数据模型管理,数据集成,数据查询API接口,低代码自定义数据处理任务模版,单任务及dag任务工作流调度等功能。集成了llm模块实现rag知识库问答,链接各数据源数据进行数据对话问答,交互式数据分析功能。
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
Data Engineering Pilipinas is a community for data engineers, data analysts, data scientists, developers, AI / ML engineers, and users of closed and open source data tools and methods / techniques in ...
Built a real-time streaming pipeline to extract stock data, using Apache Nifi, Debezium, Kafka, and Spark Streaming. Loaded the transformed data into Glue database and created real-time dashboards usi...
Sample project to demonstrate data engineering best practices
Realtime sync data from MySQL/PostgreSQL/MongoDB to Meilisearch
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.