63 results found Sort:

822
10.5k
agpl-3.0
87
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Created 2018-05-11
1,772 commits to master branch, last one 4 days ago
714
6.2k
apache-2.0
255
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
Created 2019-10-21
4,646 commits to master branch, last one 2 days ago
279
4.4k
apache-2.0
47
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckD...
Created 2022-07-07
2,296 commits to main branch, last one 3 days ago
367
4.0k
unknown
46
Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, managing...
Created 2019-09-29
4,603 commits to master branch, last one 3 days ago
125
2.7k
apache-2.0
32
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collect...
Created 2020-08-14
936 commits to mainline branch, last one 3 months ago
200
2.2k
apache-2.0
28
Scalable and efficient data transformation framework - backwards compatible with dbt.
Created 2022-09-23
3,473 commits to main branch, last one 11 hours ago
182
2.0k
apache-2.0
12
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Created 2021-08-30
5,286 commits to master branch, last one 6 days ago
337
2.0k
apache-2.0
49
Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, , 20+ connectors
Created 2016-08-19
571 commits to fdd/main branch, last one 8 months ago
175
2.0k
mit
12
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Created 2021-06-21
11,866 commits to main branch, last one a day ago
409
1.9k
apache-2.0
52
Cloud Native DataOps & AIOps Platform | 云原生数智运维平台
Created 2022-03-16
1,466 commits to main branch, last one about a year ago
240
1.1k
apache-2.0
40
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
Created 2019-01-23
1,189 commits to master branch, last one a day ago
📙 Awesome Data Catalogs and Observability Platforms.
Created 2021-07-14
94 commits to main branch, last one 12 days ago
154
748
apache-2.0
15
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Created 2021-03-22
487 commits to main branch, last one about a year ago
94
671
bsd-3-clause
34
Tenzir is the data pipeline engine for security teams.
Created 2010-09-23
24,257 commits to main branch, last one 4 days ago
DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo
Created 2019-12-06
781 commits to main branch, last one 4 days ago
A list of tools for annotating data, managing annotations, etc.
Created 2018-11-08
87 commits to master branch, last one 2 years ago
44
515
apache-2.0
12
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Created 2016-03-25
9,968 commits to master branch, last one 7 days ago
35
466
apache-2.0
18
Titan Core - Snowflake infrastructure-as-code. Provision environments, automate deploys, CI/CD. Manage RBAC, users, roles, and data access. Declarative Python Resource API. Change Management tool for ...
Created 2023-05-13
276 commits to main branch, last one about a month ago
59
445
apache-2.0
15
One framework to develop, deploy and operate data workflows with Python and SQL.
Created 2021-07-20
2,194 commits to main branch, last one 28 days ago
108
379
apache-2.0
10
Open data platform based on Kubernetes. Scaleph supports SeaTunnel、Flink and Doris backended by SeaTunnel on Flink engine、Flink Kubernetes Operator and Doris operator.
Created 2022-04-23
886 commits to dev branch, last one 3 months ago
65
347
agpl-3.0
23
Power BI DevOps & Source Control Tool
Created 2020-05-31
713 commits to main branch, last one about a month ago
62
336
apache-2.0
12
Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.
Created 2021-01-29
529 commits to main branch, last one about a year ago
12
335
apache-2.0
8
The data-validation toolkit for enhanced dbt (data build tool) PR review
Created 2023-10-06
2,308 commits to main branch, last one 15 hours ago
35
284
apache-2.0
11
Frontier is an all-in-one user management platform that provides identity, access and billing management to help organizations secure their systems and data. (Open source alternative to Clerk, WorkOS)
Created 2021-02-26
1,118 commits to main branch, last one 13 hours ago
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:
Created 2022-02-11
130 commits to main branch, last one 2 months ago
41
272
apache-2.0
14
Dagger is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of real-time streaming data.
Created 2021-03-22
825 commits to main branch, last one about a year ago
21
263
apache-2.0
12
An open source development framework to help you build data workflows and modern data architecture on AWS.
Created 2022-02-16
566 commits to main branch, last one 25 days ago
41
228
apache-2.0
29
Stencil is a schema registry that provides schema management and validation dynamically, efficiently, and reliably to ensure data compatibility across applications.
Created 2019-02-16
314 commits to main branch, last one 27 days ago
30
209
apache-2.0
14
Raccoon is a high-throughput, low-latency service to collect events in real-time from your web, mobile apps, and services using multiple network protocols.
Created 2021-03-22
213 commits to main branch, last one 6 months ago
42
204
apache-2.0
7
Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog.
Created 2021-03-22
375 commits to main branch, last one 5 months ago