Statistics for topic etl
RepositoryStats tracks 596,206 Github repositories, of these 264 are tagged with the etl topic. The most common primary language for repositories using this topic is Python (92). Other languages include: Go (39), Java (32), TypeScript (13), Rust (12), JavaScript (11)
Stargazers over time for topic etl
Most starred repositories for topic etl (view more)
Trending repositories for topic etl (view more)
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Ape Data Transfer Suite, written in Rust. Provides ultra-fast data replication between MySQL, PostgreSQL, Redis, MongoDB, Kafka and ClickHouse, ideal for disaster recovery (DR) and migration scenarios...
Sample project to demonstrate data engineering best practices
A compute framework for building Search, RAG, Recommendations and Analytics over complex structured & unstructured data.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Ape Data Transfer Suite, written in Rust. Provides ultra-fast data replication between MySQL, PostgreSQL, Redis, MongoDB, Kafka and ClickHouse, ideal for disaster recovery (DR) and migration scenarios...
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
An orchestration platform for the development, production, and observation of data assets.
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Ape Data Transfer Suite, written in Rust. Provides ultra-fast data replication between MySQL, PostgreSQL, Redis, MongoDB, Kafka and ClickHouse, ideal for disaster recovery (DR) and migration scenarios...
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
An orchestration platform for the development, production, and observation of data assets.
A compute framework for building Search, RAG, Recommendations and Analytics over complex structured & unstructured data.
Context-aware structured outputs. Search your documents or the web for specific data and get it back in JSON or Markdown.