Trending repositories for topic data

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

20,357 (+364)

agpl-3.0

DataExpert-io/data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

23,565 (+177)

Sinaptik-AI/pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

13,815 (+97)

run-llama/llama_index

LlamaIndex is a data framework for your LLM applications

37,479 (+90)

mit

TanStack/query

🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.

43,031 (+52)

mit

airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

16,518 (+38)

metabase/metabase

The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:

39,178 (+37)

akfamily/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

9,809 (+37)

mit

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+36)

bsd-3-clause

PrefectHQ/prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

17,837 (+34)

apache-2.0

quadratichq/quadratic

Quadratic | Spreadsheet with Python, SQL, and AI

3,091 (+32)

faker-js/faker

Generate massive amounts of fake data in the browser and node.js

13,208 (+26)

dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

2,835 (+20)

apache-2.0

vercel/swr

React Hooks for Data Fetching

30,745 (+15)

mit

theseus-rs/rsql

Command line SQL interface for relational databases and common data file formats

158 (+13)

apache-2.0

bchavez/Bogus

:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.

8,939 (+13)

Visualize-ML/Book6_First-Course-in-Data-Science

Book_6_《数据有道》 | 鸢尾花书：从加减乘除到机器学习；欢迎大家批评指正！纠错多的同学会得到赠书感谢！

2,113 (+11)

oxnr/awesome-bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

13,330 (+10)

mit

olifolkerd/tabulator

Interactive Tables and Data Grids for JavaScript

6,823 (+10)

mit

apple/pkl

A configuration as code language with rich validation and tooling.

10,442 (+6)

apache-2.0

Last 3 days (relative gain)

theseus-rs/rsql

Command line SQL interface for relational databases and common data file formats

158 (+9%)

apache-2.0

chase-manning/pokemon-tcg-pocket-cards

An open source repo for data on the Pokemon TCG Cards

39 (+5%)

mit

ContextData/VectorETL

Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications

84 (+2%)

mit

trailheadapps/coral-cloud

Sample application that showcases Data Cloud, Agents and Prompts.

43 (+2%)

cc0-1.0

ebonnal/streamable

[Python] Stream-like manipulation of iterables.

172 (+2%)

apache-2.0

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+2%)

bsd-3-clause

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

20,357 (+2%)

agpl-3.0

Shorya22/Data-Analytics-Projects

Explore a collection of end-to-end data analytics projects showcasing SQL, Python, and Power BI. Gain valuable insights and solutions to real-world problems through data extraction, analysis, and visu...

59 (+2%)

IBM/data-prep-kit

Open source project for data preparation of LLM application builders

370 (+1%)

apache-2.0

pracdata/awesome-open-source-data-engineering

A curated list of open source tools used in analytics platforms and data engineering ecosystem

165 (+1%)

quadratichq/quadratic

Quadratic | Spreadsheet with Python, SQL, and AI

3,091 (+1%)

bitquery/widgets

Widgets for blockchain data visualizations

124 (+0.8%)

mit

NVIDIA/NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs

682 (+0.7%)

apache-2.0

bitol-io/open-data-contract-standard

Home of the Open Data Contract Standard (ODCS).

411 (+0.7%)

apache-2.0

APA-Technology-Division/urban-and-regional-planning-resources

Community list of data & technology resources concerning the built environment and communities. 🏙️🌳🚌🚦🗺️

275 (+0.7%)

cc0-1.0

dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

2,835 (+0.7%)

apache-2.0

Sinaptik-AI/pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

13,815 (+0.7%)

felikcat/unlimited-hotspot

Remove speed restrictions on your hotspot internet (iOS, iPadOS, Android, Quectel), and allows hotspots on any plan (rooted Android & Quectel only).

153 (+0.7%)

wtfpl

dbt-labs/jaffle_shop_duckdb

Get started with dbt in less than 1 minute from `git clone` to `dbt docs serve` for free!

161 (+0.6%)

apache-2.0

ogbinar/DataEngineeringPilipinas

Data Engineering Pilipinas is a community for data engineers, data analysts, data scientists, developers, AI / ML engineers, and users of closed and open source data tools and methods / techniques in ...

169 (+0.6%)

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

20,357 (+704)

agpl-3.0

DataExpert-io/data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

23,565 (+539)

run-llama/llama_index

LlamaIndex is a data framework for your LLM applications

37,479 (+147)

mit

Sinaptik-AI/pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

13,815 (+123)

TanStack/query

🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.

43,031 (+98)

mit

airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

16,518 (+73)

metabase/metabase

The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:

39,178 (+73)

akfamily/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

9,809 (+71)

mit

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+61)

bsd-3-clause

PrefectHQ/prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

17,837 (+59)

apache-2.0

quadratichq/quadratic

Quadratic | Spreadsheet with Python, SQL, and AI

3,091 (+39)

faker-js/faker

Generate massive amounts of fake data in the browser and node.js

13,208 (+37)

dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

2,835 (+34)

apache-2.0

vercel/swr

React Hooks for Data Fetching

30,745 (+31)

mit

truefoundry/cognita

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry

3,424 (+28)

apache-2.0

theseus-rs/rsql

Command line SQL interface for relational databases and common data file formats

158 (+24)

apache-2.0

tomquirk/linkedin-api

👨‍💼 LinkedIn API for Python

2,247 (+22)

mit

prestodb/presto

The official home of the Presto distributed SQL query engine for big data

16,126 (+18)

apache-2.0

bchavez/Bogus

:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.

8,939 (+18)

IBM/data-prep-kit

Open source project for data preparation of LLM application builders

370 (+16)

apache-2.0

Last week (relative gain)

theseus-rs/rsql

Command line SQL interface for relational databases and common data file formats

158 (+18%)

apache-2.0

chase-manning/pokemon-tcg-pocket-cards

An open source repo for data on the Pokemon TCG Cards

39 (+15%)

mit

tradewelltech/beavers

Python stream processing for analytics

29 (+12%)

apache-2.0

xefi/faker-php

Generate fake data on demand.

53 (+10%)

apache-2.0

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka

Data Engineering Project with Hadoop HDFS and Kafka

38 (+9%)

mit

ebonnal/streamable

[Python] Stream-like manipulation of iterables.

172 (+8%)

apache-2.0

Shorya22/Data-Analytics-Projects

59 (+5%)

trailheadapps/coral-cloud

Sample application that showcases Data Cloud, Agents and Prompts.

43 (+5%)

cc0-1.0

IBM/data-prep-kit

Open source project for data preparation of LLM application builders

370 (+5%)

apache-2.0

scrapyman/data-api

Scrapyman数据接口服务。提供：淘宝、小红书、京东、抖音（电商）、抖音（视频）、快手、蒲公英、星图、拼多多、微信公众号、大众点评、哔哩哔哩、知乎、微博、贝壳、Bigo、Temu、Lazada、Shopee、SHEIN、百度指数、携程、Boss直聘、智联招聘、拉钩、今日头条、Facebook、Youtube、Instgram、Twitter。爬虫、采集、scrapy、接口、API。

102 (+4%)

TheJaeLal/LineFormer

Line Chart Data Extraction: Official code for LineFormer - ICDAR23 Paper

27 (+4%)

anchore/vunnel

Tool for collecting vulnerability data from various sources (used to build the grype database)

82 (+4%)

apache-2.0

microsoft/A-TALE-OF-THREE-CITIES

Analyzing the safety (311) dataset published by Azure Open Datasets for Chicago, Boston and New York City using SparkR, SParkSQL, Azure Databricks, visualization using ggplot2 and leaflet. Focus is on...

85 (+4%)

mit

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+4%)

bsd-3-clause

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

20,357 (+4%)

agpl-3.0

felikcat/unlimited-hotspot

Remove speed restrictions on your hotspot internet (iOS, iPadOS, Android, Quectel), and allows hotspots on any plan (rooted Android & Quectel only).

153 (+3%)

wtfpl

luanborelli/ipeadatapy

ipeadatapy is a data and metadata extraction package made in Python using Ipeadata database official API. In it's essence it is an API wrapper.

73 (+3%)

mit

block-mesh/block-mesh-monorepo

No description

39 (+3%)

dbt-labs/jaffle_shop_duckdb

Get started with dbt in less than 1 minute from `git clone` to `dbt docs serve` for free!

161 (+3%)

apache-2.0

ContextData/VectorETL

Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications

84 (+2%)

mit

Last month (new repositories)

Piazza-tech/Piazza-Updater

Piazza-Updater automates updates to a Weaviate database with real-time vectorial data. By continuously searching the internet and integrating with Verba repositories, it enhances retrieval-augmented g...

mit

Last month (absolute gain)

DataExpert-io/data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

23,565 (+4,125)

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

20,357 (+1,497)

agpl-3.0

run-llama/llama_index

LlamaIndex is a data framework for your LLM applications

37,479 (+629)

mit

TanStack/query

🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.

43,031 (+449)

mit

metabase/metabase

The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:

39,178 (+349)

akfamily/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

9,809 (+332)

mit

PrefectHQ/prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

17,837 (+318)

apache-2.0

airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

16,518 (+294)

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+276)

bsd-3-clause

Sinaptik-AI/pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

13,815 (+276)

faker-js/faker

Generate massive amounts of fake data in the browser and node.js

13,208 (+199)

vercel/swr

React Hooks for Data Fetching

30,745 (+171)

mit

dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

2,835 (+157)

apache-2.0

lakehq/sail

LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.

587 (+134)

apache-2.0

tomquirk/linkedin-api

👨‍💼 LinkedIn API for Python

2,247 (+113)

mit

SheetJS/sheetjs

📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs

35,267 (+109)

apache-2.0

tinyplex/tinybase

The reactive data store for local‑first apps.

3,957 (+105)

mit

speedyapply/2025-AI-College-Jobs

2025 AI/ML internship & new graduate job list updated daily

527 (+103)

truefoundry/cognita

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry

3,424 (+100)

apache-2.0

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

1,373 (+96)

mit

Last month (relative gain)

xefi/faker-php

Generate fake data on demand.

53 (+960%)

apache-2.0

microsoft/A-TALE-OF-THREE-CITIES

85 (+554%)

mit

chase-manning/pokemon-tcg-pocket-cards

An open source repo for data on the Pokemon TCG Cards

39 (+144%)

mit

ebonnal/streamable

[Python] Stream-like manipulation of iterables.

172 (+31%)

apache-2.0

lakehq/sail

LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.

587 (+30%)

apache-2.0

Shorya22/Data-Analytics-Projects

59 (+26%)

speedyapply/2025-AI-College-Jobs

2025 AI/ML internship & new graduate job list updated daily

527 (+24%)

trailheadapps/coral-cloud

Sample application that showcases Data Cloud, Agents and Prompts.

43 (+23%)

cc0-1.0

theseus-rs/rsql

Command line SQL interface for relational databases and common data file formats

158 (+22%)

apache-2.0

block-mesh/block-mesh-monorepo

No description

39 (+22%)

IBM/data-prep-kit

Open source project for data preparation of LLM application builders

370 (+22%)

apache-2.0

DataExpert-io/data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

23,565 (+21%)

ylem-co/ylem

Ylem is an open-source platform for real-time data streaming orchestration

63 (+21%)

apache-2.0

nomihq/nomi

Nomi enable people to use computer more simply.

72 (+20%)

mit

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+19%)

bsd-3-clause

mendableai/firecrawl-app-examples

🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

52 (+18%)

buster-so/buster

The open-source, AI-native data stack

61 (+17%)

robustmq/robustmq

RobustMQ is a next-generation, high-performance, cloud-native, converged message queue that is compatible with multiple mainstream message queuing protocols and has complete Serveless capabilities.

219 (+16%)

apache-2.0

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka

Data Engineering Project with Hadoop HDFS and Kafka

38 (+15%)

mit

scrapyman/data-api

102 (+15%)

Last 12-months (new repositories)

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

20,357

agpl-3.0

apple/pkl

A configuration as code language with rich validation and tooling.

10,442

apache-2.0

pretzelai/pretzelai

The modern replacement for Jupyter Notebooks

2,038

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730

bsd-3-clause

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

1,373

mit

LazyAGI/LazyLLM

Easiest and laziest way for building multi-agent LLMs applications.

1,044

apache-2.0

amphi-ai/amphi-etl

Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.

945

latitude-dev/latitude

Developer-first embedded analytics

888

lgpl-3.0

Litlyx/litlyx

Powerful Analytics Solution. Setup in 30 seconds. Display all your data on a Simple, AI-powered dashboard. Fully self-hostable and GDPR compliant.

796

apache-2.0

pgflo/pg_flo

Stream, transform, and route PostgreSQL data in real-time.

720

apache-2.0

NVIDIA/NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs

682

apache-2.0

lakehq/sail

LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.

587

apache-2.0

speedyapply/2025-AI-College-Jobs

2025 AI/ML internship & new graduate job list updated daily

527

princeton-nlp/LESS

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning

390

mit

IBM/data-prep-kit

Open source project for data preparation of LLM application builders

370

apache-2.0

rpbouman/huey

Light-weight, browser-based ROLAP pivot tables on top of DuckDB-WASM

286

mit

bosun-ai/swiftide

Fast, streaming indexing, query, and agent library for building LLM applications in Rust

282

mit

DeDolphins/DataHorse

Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.

245

mit

stocknear/backend

Backend of stocknear - Open Source Stock Analysis

204

agpl-3.0

stocknear/frontend

UI of stocknear - Open Source Stock Analysis

180

agpl-3.0

Last 12-months (absolute gain)

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

20,357 (+20,356)

agpl-3.0

DataExpert-io/data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

23,565 (+19,028)

run-llama/llama_index

LlamaIndex is a data framework for your LLM applications

37,479 (+12,018)

mit

apple/pkl

A configuration as code language with rich validation and tooling.

10,442 (+10,441)

apache-2.0

TanStack/query

🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.

43,031 (+5,468)

mit

Sinaptik-AI/pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

13,815 (+4,521)

PrefectHQ/prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

17,837 (+4,224)

apache-2.0

airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

16,518 (+3,928)

metabase/metabase

The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:

39,178 (+3,927)

truefoundry/cognita

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry

3,424 (+3,421)

apache-2.0

DeepInsight-AI/DeepBI

LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.

2,410 (+2,356)

mit

faker-js/faker

Generate massive amounts of fake data in the browser and node.js

13,208 (+2,234)

vercel/swr

React Hooks for Data Fetching

30,745 (+2,154)

mit

akfamily/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

9,809 (+2,100)

mit

pretzelai/pretzelai

The modern replacement for Jupyter Notebooks

2,038 (+2,011)

mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

8,028 (+1,923)

apache-2.0

dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

2,835 (+1,861)

apache-2.0

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+1,728)

bsd-3-clause

flyteorg/flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

5,859 (+1,593)

apache-2.0

superduper-io/superduper

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

4,874 (+1,456)

apache-2.0

Last 12-months (relative gain)

pgflo/pg_flo

Stream, transform, and route PostgreSQL data in real-time.

720 (+17,900%)

apache-2.0

speedyapply/2025-AI-College-Jobs

2025 AI/ML internship & new graduate job list updated daily

527 (+10,440%)

pretzelai/pretzelai

The modern replacement for Jupyter Notebooks

2,038 (+7,448%)

DeDolphins/DataHorse

Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.

245 (+4,800%)

mit

DeepInsight-AI/DeepBI

LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.

2,410 (+4,363%)

mit

ebonnal/streamable

[Python] Stream-like manipulation of iterables.

172 (+4,200%)

apache-2.0

robustmq/robustmq

RobustMQ is a next-generation, high-performance, cloud-native, converged message queue that is compatible with multiple mainstream message queuing protocols and has complete Serveless capabilities.

219 (+3,550%)

apache-2.0

DataRecce/recce

The data-validation toolkit for enhanced dbt (data build tool) PR review

278 (+3,375%)

apache-2.0

neurallambda/awesome-reasoning

a curated list of data for reasoning ai

115 (+2,775%)

DahnJ/Awesome-Zarr

🎀 Awesome Zarr resources

80 (+1,900%)

cc0-1.0

NVIDIA/NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs

682 (+1,794%)

apache-2.0

Vinyzu/chrome-fingerprints

A Collection of 10.000 collected Windows Chrome Fingerprints. Usable with an easy-to-use API, available as a compressed (lzma) or full-size Json (view Releases). Its just 1.4mb in size in compressed f...

186 (+1,760%)

gpl-3.0

rpbouman/huey

Light-weight, browser-based ROLAP pivot tables on top of DuckDB-WASM

286 (+1,582%)

mit

ArslanS1997/Auto-Analyst

AI data scientist

180 (+1,536%)

mit

pracdata/awesome-open-source-data-engineering

A curated list of open source tools used in analytics platforms and data engineering ecosystem

165 (+1,400%)

scrapyman/data-api

102 (+1,357%)

NoOPeEKS/DataNvim

A fully-featured batteries-included Neovim distribution for the world of Data Science. Prepared to run code and interact with Jupyter Notebooks without ever leaving your terminal.

84 (+1,300%)

gpl-3.0

CrunchyData/pgCompare

pgCompare – a straightforward utility crafted to simplify the data comparison process, providing a robust solution for comparing data across various database platforms.

104 (+1,200%)

apache-2.0

dbt-labs/jaffle-shop

🥪🦘 An open source sandbox project exploring dbt workflows via a fictional sandwich shop's data.

123 (+1,130%)

IBM/unitxt

🦄 Unitxt: a python library for getting data fired up and set for training and evaluation

165 (+1,000%)

apache-2.0