Statistics for topic data-engineering
RepositoryStats tracks 633,144 Github repositories, of these 333 are tagged with the data-engineering topic. The most common primary language for repositories using this topic is Python (144). Other languages include: Jupyter Notebook (39), Go (18), TypeScript (15), JavaScript (12), Scala (12), Rust (11)
Stargazers over time for topic data-engineering
Most starred repositories for topic data-engineering (view more)
Trending repositories for topic data-engineering (view more)
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Apache Superset is a Data Visualization and Data Exploration Platform
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
ETL framework to index data for AI, such as RAG; with realtime incremental updates and support custom logic like lego.
A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.
Modern serverless lakehouse implementing HOOK methodology, Unified Star Schema (USS), and Analytical Data Storage System (ADSS) principles on Adventure Works. Features programmatic model generation, e...
ETL framework to index data for AI, such as RAG; with realtime incremental updates and support custom logic like lego.
Learn to build your Second Brain AI assistant with LLMs, agents, RAG, fine-tuning, LLMOps and AI systems techniques.
Data Engineering Project with Hadoop HDFS and Kafka
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
Apache Superset is a Data Visualization and Data Exploration Platform
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Learn to build your Second Brain AI assistant with LLMs, agents, RAG, fine-tuning, LLMOps and AI systems techniques.
ETL framework to index data for AI, such as RAG; with realtime incremental updates and support custom logic like lego.
A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.
Learn to build your Second Brain AI assistant with LLMs, agents, RAG, fine-tuning, LLMOps and AI systems techniques.
Community supported integrations for the Dagster platform.
Modern serverless lakehouse implementing HOOK methodology, Unified Star Schema (USS), and Analytical Data Storage System (ADSS) principles on Adventure Works. Features programmatic model generation, e...
ETL framework to index data for AI, such as RAG; with realtime incremental updates and support custom logic like lego.
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
Apache Superset is a Data Visualization and Data Exploration Platform
Learn to build your Second Brain AI assistant with LLMs, agents, RAG, fine-tuning, LLMOps and AI systems techniques.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
ETL framework to index data for AI, such as RAG; with realtime incremental updates and support custom logic like lego.
Learn to build your Second Brain AI assistant with LLMs, agents, RAG, fine-tuning, LLMOps and AI systems techniques.
Modern serverless lakehouse implementing HOOK methodology, Unified Star Schema (USS), and Analytical Data Storage System (ADSS) principles on Adventure Works. Features programmatic model generation, e...
RushDB is an instant database for modern apps and DS/ML ops built on top of Neo4j
A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.
This repository contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis.
Learn to build your Second Brain AI assistant with LLMs, agents, RAG, fine-tuning, LLMOps and AI systems techniques.
数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数据任务、数据高频上报、数据高频采集、异构数据兼容的实际数据问题。
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
Turns Data and AI algorithms into production-ready web applications in no time.
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
Apache Superset is a Data Visualization and Data Exploration Platform
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Code for "Efficient Data Processing in Spark" Course
This repo contains "Databricks Certified Data Engineer Professional" Questions and related docs.