Trending repositories for topic data
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
This is a repo with links to everything you'd ever want to learn about data engineering
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Command line SQL interface for relational databases and common data file formats
:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.
Book_6_《数据有道》 | 鸢尾花书:从加减乘除到机器学习;欢迎大家批评指正!纠错多的同学会得到赠书感谢!
A curated list of awesome big data frameworks, ressources and other awesomeness.
Command line SQL interface for relational databases and common data file formats
An open source repo for data on the Pokemon TCG Cards
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
Sample application that showcases Data Cloud, Agents and Prompts.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Explore a collection of end-to-end data analytics projects showcasing SQL, Python, and Power BI. Gain valuable insights and solutions to real-world problems through data extraction, analysis, and visu...
Open source project for data preparation of LLM application builders
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Scalable data pre processing and curation toolkit for LLMs
Home of the Open Data Contract Standard (ODCS).
Community list of data & technology resources concerning the built environment and communities. 🏙️🌳🚌🚦🗺️
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
Remove speed restrictions on your hotspot internet (iOS, iPadOS, Android, Quectel), and allows hotspots on any plan (rooted Android & Quectel only).
Get started with dbt in less than 1 minute from `git clone` to `dbt docs serve` for free!
Data Engineering Pilipinas is a community for data engineers, data analysts, data scientists, developers, AI / ML engineers, and users of closed and open source data tools and methods / techniques in ...
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
This is a repo with links to everything you'd ever want to learn about data engineering
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
Command line SQL interface for relational databases and common data file formats
The official home of the Presto distributed SQL query engine for big data
:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.
Open source project for data preparation of LLM application builders
Command line SQL interface for relational databases and common data file formats
An open source repo for data on the Pokemon TCG Cards
Data Engineering Project with Hadoop HDFS and Kafka
Explore a collection of end-to-end data analytics projects showcasing SQL, Python, and Power BI. Gain valuable insights and solutions to real-world problems through data extraction, analysis, and visu...
Sample application that showcases Data Cloud, Agents and Prompts.
Open source project for data preparation of LLM application builders
Scrapyman数据接口服务。提供:淘宝、小红书、京东、抖音(电商)、抖音(视频)、快手、蒲公英、星图、拼多多、微信公众号、大众点评、哔哩哔哩、知乎、微博、贝壳、Bigo、Temu、Lazada、Shopee、SHEIN、百度指数、携程、Boss直聘、智联招聘、拉钩、今日头条、Facebook、Youtube、Instgram、Twitter。爬虫、采集、scrapy、接口、API。
Line Chart Data Extraction: Official code for LineFormer - ICDAR23 Paper
Tool for collecting vulnerability data from various sources (used to build the grype database)
Analyzing the safety (311) dataset published by Azure Open Datasets for Chicago, Boston and New York City using SparkR, SParkSQL, Azure Databricks, visualization using ggplot2 and leaflet. Focus is on...
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Remove speed restrictions on your hotspot internet (iOS, iPadOS, Android, Quectel), and allows hotspots on any plan (rooted Android & Quectel only).
ipeadatapy is a data and metadata extraction package made in Python using Ipeadata database official API. In it's essence it is an API wrapper.
Get started with dbt in less than 1 minute from `git clone` to `dbt docs serve` for free!
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
Piazza-Updater automates updates to a Weaviate database with real-time vectorial data. By continuously searching the internet and integrating with Verba repositories, it enhances retrieval-augmented g...
This is a repo with links to everything you'd ever want to learn about data engineering
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs
2025 AI/ML internship & new graduate job list updated daily
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
Analyzing the safety (311) dataset published by Azure Open Datasets for Chicago, Boston and New York City using SparkR, SParkSQL, Azure Databricks, visualization using ggplot2 and leaflet. Focus is on...
An open source repo for data on the Pokemon TCG Cards
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
Explore a collection of end-to-end data analytics projects showcasing SQL, Python, and Power BI. Gain valuable insights and solutions to real-world problems through data extraction, analysis, and visu...
2025 AI/ML internship & new graduate job list updated daily
Sample application that showcases Data Cloud, Agents and Prompts.
Command line SQL interface for relational databases and common data file formats
Open source project for data preparation of LLM application builders
This is a repo with links to everything you'd ever want to learn about data engineering
Ylem is an open-source platform for real-time data streaming orchestration
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.
RobustMQ is a next-generation, high-performance, cloud-native, converged message queue that is compatible with multiple mainstream message queuing protocols and has complete Serveless capabilities.
Data Engineering Project with Hadoop HDFS and Kafka
Scrapyman数据接口服务。提供:淘宝、小红书、京东、抖音(电商)、抖音(视频)、快手、蒲公英、星图、拼多多、微信公众号、大众点评、哔哩哔哩、知乎、微博、贝壳、Bigo、Temu、Lazada、Shopee、SHEIN、百度指数、携程、Boss直聘、智联招聘、拉钩、今日头条、Facebook、Youtube、Instgram、Twitter。爬虫、采集、scrapy、接口、API。
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Easiest and laziest way for building multi-agent LLMs applications.
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Powerful Analytics Solution. Setup in 30 seconds. Display all your data on a Simple, AI-powered dashboard. Fully self-hostable and GDPR compliant.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
Open source project for data preparation of LLM application builders
Fast, streaming indexing, query, and agent library for building LLM applications in Rust
Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
This is a repo with links to everything you'd ever want to learn about data engineering
A configuration as code language with rich validation and tooling.
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
🧙 Build, run, and manage data pipelines for integrating and transforming data.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.
2025 AI/ML internship & new graduate job list updated daily
Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.
LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.
RobustMQ is a next-generation, high-performance, cloud-native, converged message queue that is compatible with multiple mainstream message queuing protocols and has complete Serveless capabilities.
The data-validation toolkit for enhanced dbt (data build tool) PR review
Scalable data pre processing and curation toolkit for LLMs
A Collection of 10.000 collected Windows Chrome Fingerprints. Usable with an easy-to-use API, available as a compressed (lzma) or full-size Json (view Releases). Its just 1.4mb in size in compressed f...
Light-weight, browser-based ROLAP pivot tables on top of DuckDB-WASM
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Scrapyman数据接口服务。提供:淘宝、小红书、京东、抖音(电商)、抖音(视频)、快手、蒲公英、星图、拼多多、微信公众号、大众点评、哔哩哔哩、知乎、微博、贝壳、Bigo、Temu、Lazada、Shopee、SHEIN、百度指数、携程、Boss直聘、智联招聘、拉钩、今日头条、Facebook、Youtube、Instgram、Twitter。爬虫、采集、scrapy、接口、API。
A fully-featured batteries-included Neovim distribution for the world of Data Science. Prepared to run code and interact with Jupyter Notebooks without ever leaving your terminal.
pgCompare – a straightforward utility crafted to simplify the data comparison process, providing a robust solution for comparing data across various database platforms.
🥪🦘 An open source sandbox project exploring dbt workflows via a fictional sandwich shop's data.
🦄 Unitxt: a python library for getting data fired up and set for training and evaluation