Statistics for topic data-engineering
RepositoryStats tracks 518,325 Github repositories, of these 252 are tagged with the data-engineering topic. The most common primary language for repositories using this topic is Python (109). Other languages include: Jupyter Notebook (29), Go (15), JavaScript (11)
Stargazers over time for topic data-engineering
Most starred repositories for topic data-engineering (view more)
Trending repositories for topic data-engineering (view more)
Apache Superset is a Data Visualization and Data Exploration Platform
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown
Code for "Efficient Data Processing in Spark" Course
A curated list of open source tools used in analytical stacks and data engineering ecosystem
Apache Superset is a Data Visualization and Data Exploration Platform
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Code for "Efficient Data Processing in Spark" Course
A curated list of open source tools used in analytical stacks and data engineering ecosystem
All of my individual learning materials, documents, and notes from the process of getting the Coursera IBM Data Engineer Professional Certificate specialization are stored in this repository.
More than 2000+ Data engineer interview questions.
Apache Superset is a Data Visualization and Data Exploration Platform
Turns Data and AI algorithms into production-ready web applications in no time.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Code for "Efficient Data Processing in Spark" Course
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algor...
A Python package that creates fine-grained dbt tasks on Apache Airflow
🥪🦘 An open source sandbox project exploring dbt workflows via a fictional sandwich shop's data.
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack.
VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
Un repositorio más con conceptos básicos, desafíos técnicos y recursos sobre ingeniería de datos en español 🧙✨
Turns Data and AI algorithms into production-ready web applications in no time.
Apache Superset is a Data Visualization and Data Exploration Platform
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Orbital automates integration between data sources (APIs, Databases, Queues and Functions). BFF's, API Composition and ETL pipelines that adapt as your specs change.
Turns Data and AI algorithms into production-ready web applications in no time.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
A DuckDB extension to read data directly from databases supporting the ODBC interface