Statistics for topic data-engineering
RepositoryStats tracks 579,129 Github repositories, of these 292 are tagged with the data-engineering topic. The most common primary language for repositories using this topic is Python (127). Other languages include: Jupyter Notebook (34), Go (17), JavaScript (12), TypeScript (11)
Stargazers over time for topic data-engineering
Most starred repositories for topic data-engineering (view more)
Trending repositories for topic data-engineering (view more)
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Turns Data and AI algorithms into production-ready web applications in no time.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Distributed data engine for Python/SQL designed for the cloud, powered by Rust
Materials for the Deploy and Monitor ML Pipelines with Python, Docker and GitHub Actions workshop at the PyData NYC 2024 conference
数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数据任务、数据高频上报、数据高频采集、异构数据兼容的实际数据问题。
Code for "Efficient Data Processing in Spark" Course
Dagster Labs' open-source data platform, built with Dagster.
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Turns Data and AI algorithms into production-ready web applications in no time.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Distributed data engine for Python/SQL designed for the cloud, powered by Rust
Materials for the Deploy and Monitor ML Pipelines with Python, Docker and GitHub Actions workshop at the PyData NYC 2024 conference
数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数据任务、数据高频上报、数据高频采集、异构数据兼容的实际数据问题。
Code for "Efficient Data Processing in Spark" Course
Dagster Labs' open-source data platform, built with Dagster.
Turns Data and AI algorithms into production-ready web applications in no time.
Apache Superset is a Data Visualization and Data Exploration Platform
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Materials for the Deploy and Monitor ML Pipelines with Python, Docker and GitHub Actions workshop at the PyData NYC 2024 conference
数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数据任务、数据高频上报、数据高频采集、异构数据兼容的实际数据问题。
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Un repositorio más con conceptos básicos, desafíos técnicos y recursos sobre ingeniería de datos en español 🧙✨
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数据任务、数据高频上报、数据高频采集、异构数据兼容的实际数据问题。
Code for "Efficient Data Processing in Spark" Course
Turns Data and AI algorithms into production-ready web applications in no time.
Apache Superset is a Data Visualization and Data Exploration Platform
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
Code for "Efficient Data Processing in Spark" Course
🥪🦘 An open source sandbox project exploring dbt workflows via a fictional sandwich shop's data.
Turns Data and AI algorithms into production-ready web applications in no time.