Statistics for topic data-engineering
RepositoryStats tracks 650,729 Github repositories, of these 347 are tagged with the data-engineering topic. The most common primary language for repositories using this topic is Python (151). Other languages include: Jupyter Notebook (41), Go (18), TypeScript (15), JavaScript (13), Scala (13), Rust (12)
Stargazers over time for topic data-engineering
Most starred repositories for topic data-engineering (view more)
Trending repositories for topic data-engineering (view more)
Real-time data transformation framework for AI. Ultra performant, with incremental processing.
Learn to build your Second Brain AI assistant with LLMs, agents, RAG, fine-tuning, LLMOps and AI systems techniques.
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Real-time data transformation framework for AI. Ultra performant, with incremental processing.
A declarative PySpark framework for row- and aggregate-level data quality validation.
About The most comprehensive SQL guide from a real-world expert! Learn everything from basics to advanced queries, optimizations, and real-world SQL
Learn to build your Second Brain AI assistant with LLMs, agents, RAG, fine-tuning, LLMOps and AI systems techniques.
Dataform Tools - VS Code extension to run and visualise Dataform data pipelines and much more
Real-time data transformation framework for AI. Ultra performant, with incremental processing.
Apache Superset is a Data Visualization and Data Exploration Platform
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Learn to build your Second Brain AI assistant with LLMs, agents, RAG, fine-tuning, LLMOps and AI systems techniques.
A declarative PySpark framework for row- and aggregate-level data quality validation.
Real-time data transformation framework for AI. Ultra performant, with incremental processing.
About The most comprehensive SQL guide from a real-world expert! Learn everything from basics to advanced queries, optimizations, and real-world SQL
A MCP (Model Context Protocol) server for interacting with dbt.
Learn to build your Second Brain AI assistant with LLMs, agents, RAG, fine-tuning, LLMOps and AI systems techniques.
Interactive Python TUI for visualizing and analyzing files with multipe formats
A declarative PySpark framework for row- and aggregate-level data quality validation.
Real-time data transformation framework for AI. Ultra performant, with incremental processing.
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
Apache Superset is a Data Visualization and Data Exploration Platform
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
A MCP (Model Context Protocol) server for interacting with dbt.
Real-time data transformation framework for AI. Ultra performant, with incremental processing.
ELT Data Pipeline implementation in Data Warehousing environment
About The most comprehensive SQL guide from a real-world expert! Learn everything from basics to advanced queries, optimizations, and real-world SQL
Real-time data transformation framework for AI. Ultra performant, with incremental processing.
Learn to build your Second Brain AI assistant with LLMs, agents, RAG, fine-tuning, LLMOps and AI systems techniques.
数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数据任务、数据高频上报、数据高频采集、异构数据兼容的实际数据问题。
Turns Data and AI algorithms into production-ready web applications in no time.
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
Apache Superset is a Data Visualization and Data Exploration Platform
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Collection of Snowflake Notebook demos, tutorials, and examples
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
Data Engineering Project with Hadoop HDFS and Kafka