Statistics for topic etl
RepositoryStats tracks 518,989 Github repositories, of these 234 are tagged with the etl topic. The most common primary language for repositories using this topic is Python (79). Other languages include: Go (35), Java (30), TypeScript (13)
Stargazers over time for topic etl
Most starred repositories for topic etl (view more)
Trending repositories for topic etl (view more)
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Apache Doris is an easy-to-use, high performance and unified analytics database.
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Radient turns many data types (not just text) into vectors for similarity search, clustering, regression analysis, and more.
A curated list of open source tools used in analytical stacks and data engineering ecosystem
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack.
A curated list of awesome system integration software and resources.
Radient turns many data types (not just text) into vectors for similarity search, clustering, regression analysis, and more.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
An orchestration platform for the development, production, and observation of data assets.
Radient turns many data types (not just text) into vectors for similarity search, clustering, regression analysis, and more.
A curated list of open source tools used in analytical stacks and data engineering ecosystem
All of my individual learning materials, documents, and notes from the process of getting the Coursera IBM Data Engineer Professional Certificate specialization are stored in this repository.
Sample project to demonstrate data engineering best practices
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
An orchestration platform for the development, production, and observation of data assets.
A curated list of open source tools used in analytical stacks and data engineering ecosystem
A compute framework for turning complex data into vectors. Build multimodal vectors with ease and define weights at query time so you don't need a custom reranking algorithm to optimise results. Go st...
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack.
Radient turns many data types (not just text) into vectors for similarity search, clustering, regression analysis, and more.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Apache Doris is an easy-to-use, high performance and unified analytics database.
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
An orchestration platform for the development, production, and observation of data assets.
Orbital automates integration between data sources (APIs, Databases, Queues and Functions). BFF's, API Composition and ETL pipelines that adapt as your specs change.
⚡ valmi.io reverse ETL (data activation) is the open source ( OSS ) data activation platform to load data from warehouses into Webhooks and SaaS tools like Klaviyo, Facebook Ads, Salesforce, Braze etc...
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.