Trending repositories for topic etl
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
An orchestration platform for the development, production, and observation of data assets.
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Doris is an easy-to-use, high performance and unified analytics database.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Actively curated list of awesome BI tools. PRs welcome!
The best place to learn data engineering. Built and maintained by the data engineering community.
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
基于python和llm大模型开发的数据处理和任务调度系统。 支持数据源管理,数据模型管理,数据集成,数据查询API接口,低代码自定义数据处理任务模版,单任务及dag任务工作流调度等功能。集成了llm模块实现rag知识库问答,链接各数据源数据进行数据对话问答,交互式数据分析功能。
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
The most advanced data processing framework allowing to build scalable data processing pipelines and move data between various data sources and destinations.
Efficient data transformation and modeling framework that is backwards compatible with dbt.
ReplicaDB is open source tool for database replication, designed for efficiently transferring bulk data between relational and non-relational databases
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
A curated list with resources about node-based UIs
The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
An orchestration platform for the development, production, and observation of data assets.
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Doris is an easy-to-use, high performance and unified analytics database.
A curated list with resources about node-based UIs
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
The best place to learn data engineering. Built and maintained by the data engineering community.
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
A curated collection of AI, data engineering, and DevOps projects featuring real-world applications, advanced techniques, and tutorials—ideal for learners and practitioners exploring data science and ...
This is a template you can use for your next data engineering portfolio project.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Ape Data Transfer Suite, written in Rust. Provides ultra-fast data replication between MySQL, PostgreSQL, Redis, MongoDB, Kafka and ClickHouse, ideal for disaster recovery (DR) and migration scenarios...
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
基于python和llm大模型开发的数据处理和任务调度系统。 支持数据源管理,数据模型管理,数据集成,数据查询API接口,低代码自定义数据处理任务模版,单任务及dag任务工作流调度等功能。集成了llm模块实现rag知识库问答,链接各数据源数据进行数据对话问答,交互式数据分析功能。
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
The most advanced data processing framework allowing to build scalable data processing pipelines and move data between various data sources and destinations.
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
An orchestration platform for the development, production, and observation of data assets.
Apache Doris is an easy-to-use, high performance and unified analytics database.
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming ...
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
A curated list with resources about node-based UIs
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Ape Data Transfer Suite, written in Rust. Provides ultra-fast data replication between MySQL, PostgreSQL, Redis, MongoDB, Kafka and ClickHouse, ideal for disaster recovery (DR) and migration scenarios...
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Ape Data Transfer Suite, written in Rust. Provides ultra-fast data replication between MySQL, PostgreSQL, Redis, MongoDB, Kafka and ClickHouse, ideal for disaster recovery (DR) and migration scenarios...
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
A curated collection of AI, data engineering, and DevOps projects featuring real-world applications, advanced techniques, and tutorials—ideal for learners and practitioners exploring data science and ...
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
This Node-RED custom node lets you consume google sheets spreadsheets on demand.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
基于python和llm大模型开发的数据处理和任务调度系统。 支持数据源管理,数据模型管理,数据集成,数据查询API接口,低代码自定义数据处理任务模版,单任务及dag任务工作流调度等功能。集成了llm模块实现rag知识库问答,链接各数据源数据进行数据对话问答,交互式数据分析功能。
The most advanced data processing framework allowing to build scalable data processing pipelines and move data between various data sources and destinations.
Use SQL to instantly query repositories, users, gists and more from GitHub. Open source CLI. No DB required.
Arquitetura CRM de Baixo Custo com Gen AI, projetada para startups que precisam processar e analisar dados de vendas de forma eficiente.
Context-aware structured outputs. Search your documents or the web for specific data and get it back in JSON or Markdown.
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence
Context-aware structured outputs. Search your documents or the web for specific data and get it back in JSON or Markdown.
Decentralized, account-centric programmable indexing network for web3. Supports blockchain explorers, on-chain portfolios, social graphs, and ZK coprocessors. Open-source EVM-compatible indexer availa...
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
Ylem is an open-source platform for real-time data streaming orchestration
Arquitetura CRM de Baixo Custo com Gen AI, projetada para startups que precisam processar e analisar dados de vendas de forma eficiente.
DataForge helps data teams write functional transformation pipelines by leveraging software engineering principles
Arcane Insight is a data analytics project designed to harness the power of SQLMesh & DuckDB to collect, transform, and analyze data from Blizzard’s Hearthstone API. Focused on card statistics and att...
This Node-RED custom node lets you consume google sheets spreadsheets on demand.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
An orchestration platform for the development, production, and observation of data assets.
Apache Doris is an easy-to-use, high performance and unified analytics database.
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
🧙 Build, run, and manage data pipelines for integrating and transforming data.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming ...
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.
A curated list with resources about node-based UIs
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Context-aware structured outputs. Search your documents or the web for specific data and get it back in JSON or Markdown.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
基于python和llm大模型开发的数据处理和任务调度系统。 支持数据源管理,数据模型管理,数据集成,数据查询API接口,低代码自定义数据处理任务模版,单任务及dag任务工作流调度等功能。集成了llm模块实现rag知识库问答,链接各数据源数据进行数据对话问答,交互式数据分析功能。
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
This Node-RED custom node lets you consume google sheets spreadsheets on demand.
A curated collection of AI, data engineering, and DevOps projects featuring real-world applications, advanced techniques, and tutorials—ideal for learners and practitioners exploring data science and ...
Arcane Insight is a data analytics project designed to harness the power of SQLMesh & DuckDB to collect, transform, and analyze data from Blizzard’s Hearthstone API. Focused on card statistics and att...
Sample project to demonstrate data engineering best practices
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.