Statistics for topic data-engineering
RepositoryStats tracks 595,858 Github repositories, of these 304 are tagged with the data-engineering topic. The most common primary language for repositories using this topic is Python (131). Other languages include: Jupyter Notebook (35), Go (18), JavaScript (12), Scala (12), TypeScript (12)
Stargazers over time for topic data-engineering
Most starred repositories for topic data-engineering (view more)
Trending repositories for topic data-engineering (view more)
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Turns Data and AI algorithms into production-ready web applications in no time.
Apache Superset is a Data Visualization and Data Exploration Platform
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数据任务、数据高频上报、数据高频采集、异构数据兼容的实际数据问题。
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Turns Data and AI algorithms into production-ready web applications in no time.
Home of the Open Data Contract Standard (ODCS).
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Turns Data and AI algorithms into production-ready web applications in no time.
Apache Superset is a Data Visualization and Data Exploration Platform
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Data Engineering Project with Hadoop HDFS and Kafka
数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数据任务、数据高频上报、数据高频采集、异构数据兼容的实际数据问题。
This repo contains "Databricks Certified Data Engineer Professional" Questions and related docs.
Turns Data and AI algorithms into production-ready web applications in no time.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Apache Superset is a Data Visualization and Data Exploration Platform
An orchestration platform for the development, production, and observation of data assets.
Arcane Insight is a data analytics project designed to harness the power of SQLMesh & DuckDB to collect, transform, and analyze data from Blizzard’s Hearthstone API. Focused on card statistics and att...
数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数据任务、数据高频上报、数据高频采集、异构数据兼容的实际数据问题。
This repo contains "Databricks Certified Data Engineer Professional" Questions and related docs.
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数据任务、数据高频上报、数据高频采集、异构数据兼容的实际数据问题。
Turns Data and AI algorithms into production-ready web applications in no time.
Apache Superset is a Data Visualization and Data Exploration Platform
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Code for "Efficient Data Processing in Spark" Course
The data-validation toolkit for enhanced dbt (data build tool) PR review