Trending repositories for topic etl
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
An orchestration platform for the development, production, and observation of data assets.
Apache Doris is an easy-to-use, high performance and unified analytics database.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming ...
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
The best place to learn data engineering. Built and maintained by the data engineering community.
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...
基于python和llm大模型开发的数据处理和任务调度系统。 支持数据源管理,数据模型管理,数据集成,数据查询API接口,低代码自定义数据处理任务模版,单任务及dag任务工作流调度等功能。集成了llm模块实现rag知识库问答,链接各数据源数据进行数据对话问答,交互式数据分析功能。
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
go-etl is a toolset for data extraction, transformation and loading。(go-etl是一个集数据源抽取,转化,加载的工具集,提供强大的数据同步能力)
CLI tool for dbt users to simplify creation of staging models (yml and sql) files
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Realtime sync data from MySQL/PostgreSQL/MongoDB to Meilisearch
A curated list with resources about node-based UIs
Server application to serve U.S. federal spending data via a RESTful API
Logical Replication extension for PostgreSQL 17, 16, 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgra...
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
An orchestration platform for the development, production, and observation of data assets.
A curated list with resources about node-based UIs
Apache Doris is an easy-to-use, high performance and unified analytics database.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming ...
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
The best place to learn data engineering. Built and maintained by the data engineering community.
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
OpenSource data platform to build event-driven systems. It's like Deebezium for golang :)
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Realtime sync data from MySQL/PostgreSQL/MongoDB to Meilisearch
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
🌟 Examples of use cases that utilize Decodable, as well as demos for related open-source projects such as Apache Flink, Debezium, and Postgres.
基于python和llm大模型开发的数据处理和任务调度系统。 支持数据源管理,数据模型管理,数据集成,数据查询API接口,低代码自定义数据处理任务模版,单任务及dag任务工作流调度等功能。集成了llm模块实现rag知识库问答,链接各数据源数据进行数据对话问答,交互式数据分析功能。
end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence
A curated list with resources about node-based UIs
Efficient data transformation and modeling framework that is backwards compatible with dbt.
CLI tool for dbt users to simplify creation of staging models (yml and sql) files
A curated list of open source tools used in analytical stacks and data engineering ecosystem
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
An orchestration platform for the development, production, and observation of data assets.
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming ...
Apache Doris is an easy-to-use, high performance and unified analytics database.
A curated list with resources about node-based UIs
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
The best place to learn data engineering. Built and maintained by the data engineering community.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
OpenSource data platform to build event-driven systems. It's like Deebezium for golang :)
基于python和llm大模型开发的数据处理和任务调度系统。 支持数据源管理,数据模型管理,数据集成,数据查询API接口,低代码自定义数据处理任务模版,单任务及dag任务工作流调度等功能。集成了llm模块实现rag知识库问答,链接各数据源数据进行数据对话问答,交互式数据分析功能。
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
This is a template you can use for your next data engineering portfolio project.
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
Projects done in the Data Engineer Nanodegree Program by Udacity.com
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence
Data Engineering Pilipinas is a community for data engineers, data analysts, data scientists, developers, AI / ML engineers, and users of closed and open source data tools and methods / techniques in ...
A curated list with resources about node-based UIs
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.
end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence
A curated list of open source tools used in analytical stacks and data engineering ecosystem
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Steampipe SQLite is a zero-ETL engine for SQLite. Virtual tables translate queries into live API calls for cloud services and APIs. Hundreds of plugins with thousands of documented examples.
DataForge helps data teams write functional transformation pipelines by leveraging software engineering principles
OpenSource data platform to build event-driven systems. It's like Deebezium for golang :)
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
An orchestration platform for the development, production, and observation of data assets.
Apache Doris is an easy-to-use, high performance and unified analytics database.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming ...
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
Efficient data transformation and modeling framework that is backwards compatible with dbt.
A curated list with resources about node-based UIs
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Data Engineering Pilipinas is a community for data engineers, data analysts, data scientists, developers, AI / ML engineers, and users of closed and open source data tools and methods / techniques in ...
基于python和llm大模型开发的数据处理和任务调度系统。 支持数据源管理,数据模型管理,数据集成,数据查询API接口,低代码自定义数据处理任务模版,单任务及dag任务工作流调度等功能。集成了llm模块实现rag知识库问答,链接各数据源数据进行数据对话问答,交互式数据分析功能。
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
A curated list of open source tools used in analytical stacks and data engineering ecosystem
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
Orbital automates integration between data sources (APIs, Databases, Queues and Functions). BFF's, API Composition and ETL pipelines that adapt as your specs change.
A Declarative framework for Building, Maintaining, and Analyzing Graph Data
Sample project to demonstrate data engineering best practices
When no one can tell the difference between art, and an empty canvas, the meaning is lost.