Trending repositories for topic data
This is a repo with links to everything you'd ever want to learn about data engineering
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Open source project for data preparation of LLM application builders
Ylem is an open-source platform for real-time data streaming orchestration
This is a repo with links to everything you'd ever want to learn about data engineering
Ylem is an open-source platform for real-time data streaming orchestration
A project providing a Graphic Walker Pane for use with HoloViz Panel.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
2025 AI/ML internship & new graduate job list updated daily
Open source project for data preparation of LLM application builders
Remove speed restrictions on your hotspot internet (iOS, iPadOS, Android, Quectel), and allows hotspots on any plan (rooted Android & Quectel only).
Introductory guide to the art and science of data visualisation. Insights, advice, and examples (with code) to make data outputs more readable, accessible, and impactful.
Secure destruction of sensitive virtual data, temporary files and swap partitions
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
🥪🦘 An open source sandbox project exploring dbt workflows via a fictional sandwich shop's data.
You can find links to data acquisition websites.
A massive list including a huge amount of products and services that are completely free!
Scalable data pre processing and curation toolkit for LLMs
This is a repo with links to everything you'd ever want to learn about data engineering
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
A configuration as code language with rich validation and tooling.
Book_6_《数据有道》 | 鸢尾花书:从加减乘除到机器学习;欢迎大家批评指正!纠错多的同学会得到赠书感谢!
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Transforms PDF, Documents and Images into Enriched Structured Data
A project providing a Graphic Walker Pane for use with HoloViz Panel.
This is a repo with links to everything you'd ever want to learn about data engineering
Ylem is an open-source platform for real-time data streaming orchestration
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
2025 AI/ML internship & new graduate job list updated daily
Browser-only utils for sharing/synchronizing data using "animated" QR codes
Open source project for data preparation of LLM application builders
This code is used to perform web scraping and data extraction from Google Maps. It is particularly designed for obtaining information about businesses, including their name, address, website, phone nu...
Remove speed restrictions on your hotspot internet (iOS, iPadOS, Android, Quectel), and allows hotspots on any plan (rooted Android & Quectel only).
Introductory guide to the art and science of data visualisation. Insights, advice, and examples (with code) to make data outputs more readable, accessible, and impactful.
A tool for easily extracting front matter out of a string. It is a fast Rust implementation of gray-matter. Parses YAML, JSON, TOML and support for custom parsers. Use it and let me know by giving it ...
Open data product with real estate listings from Idealista. The datasets are for three major cities in Spain and the year 2018. https://doi.org/10.1177/23998083241242844
🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.
AG Charts is a fully-featured and highly customizable JavaScript charting library. The professional choice for developers building enterprise applications
A project providing a Graphic Walker Pane for use with HoloViz Panel.
This is a repo with links to everything you'd ever want to learn about data engineering
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
2025 AI/ML internship & new graduate job list updated daily
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
All-in-one Analytics Solution. Setup in 30 seconds. Display all your data on an AI-powered dashboard. Fully self-hostable and GDPR compliant.
A configuration as code language with rich validation and tooling.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Easiest and laziest way for building multi-agent LLMs applications.
Ylem is an open-source platform for real-time data streaming orchestration
2025 AI/ML internship & new graduate job list updated daily
This is a repo with links to everything you'd ever want to learn about data engineering
Open source project for data preparation of LLM application builders
All-in-one Analytics Solution. Setup in 30 seconds. Display all your data on an AI-powered dashboard. Fully self-hostable and GDPR compliant.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Sample application that showcases Data Cloud, Agents and Prompts.
Data Engineering Project with Hadoop HDFS and Kafka
Scrapyman数据接口服务。提供:淘宝、小红书、京东、抖音(电商)、抖音(视频)、快手、蒲公英、星图、拼多多、微信公众号、大众点评、哔哩哔哩、知乎、微博、贝壳、Bigo、Temu、Lazada、Shopee、SHEIN、百度指数、携程、Boss直聘、智联招聘、拉钩、今日头条、Facebook、Youtube、Instgram、Twitter。爬虫、采集、scrapy、接口、API。
Game Mobile Foundation (Android + iOS) Using Unity3D. Simple, Fast and no GC
Explore a collection of end-to-end data analytics projects showcasing SQL, Python, and Power BI. Gain valuable insights and solutions to real-world problems through data extraction, analysis, and visu...
🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.
Scalable data pre processing and curation toolkit for LLMs
pgCompare – a straightforward utility crafted to simplify the data comparison process, providing a robust solution for comparing data across various database platforms.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Easiest and laziest way for building multi-agent LLMs applications.
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
All-in-one Analytics Solution. Setup in 30 seconds. Display all your data on an AI-powered dashboard. Fully self-hostable and GDPR compliant.
Un repositorio más con conceptos básicos, desafíos técnicos y recursos sobre ingeniería de datos en español 🧙✨
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
Open source project for data preparation of LLM application builders
Fast, streaming indexing and query library for AI (RAG) applications, written in Rust
Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.
A Collection of 10.000 collected Windows Chrome Fingerprints. Usable with an easy-to-use API, available as a compressed (lzma) or full-size Json (view Releases). Its just 1.4mb in size in compressed f...
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
This is a repo with links to everything you'd ever want to learn about data engineering
A configuration as code language with rich validation and tooling.
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.
2025 AI/ML internship & new graduate job list updated daily
Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.
The dbt data-validation toolkit for teams that care about building better data
RobustMQ is a next-generation, high-performance, cloud-native, converged message queue that is compatible with multiple mainstream message queuing protocols and has complete Serveless capabilities.
A guide for technical professionals looking to start consulting
Scrapyman数据接口服务。提供:淘宝、小红书、京东、抖音(电商)、抖音(视频)、快手、蒲公英、星图、拼多多、微信公众号、大众点评、哔哩哔哩、知乎、微博、贝壳、Bigo、Temu、Lazada、Shopee、SHEIN、百度指数、携程、Boss直聘、智联招聘、拉钩、今日头条、Facebook、Youtube、Instgram、Twitter。爬虫、采集、scrapy、接口、API。
Scalable data pre processing and curation toolkit for LLMs
AG Charts is a fully-featured and highly customizable JavaScript charting library. The professional choice for developers building enterprise applications
🥪🦘 An open source sandbox project exploring dbt workflows via a fictional sandwich shop's data.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
A fully-featured batteries-included Neovim distribution for the world of Data Science. Prepared to run code and interact with Jupyter Notebooks without ever leaving your terminal.
Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.
pgCompare – a straightforward utility crafted to simplify the data comparison process, providing a robust solution for comparing data across various database platforms.
🦄 Unitxt: a python library for getting data fired up and set for training and evaluation