Trending repositories for topic data
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
This is a repo with links to everything you'd ever want to learn about data engineering
🐵 Preswald is a full-stack platform for building, deploying, and managing interactive data applications. It brings ingestion, storage, transformation, and visualization into a simple SDK, minimizing ...
LlamaIndex is the leading framework for building LLM-powered agents over your data.
Manuscript is a revolutionary blockchain data streaming framework. With Manuscript, you can seamlessly integrate on-chain and off-chain data into target data storage for unrestricted querying and anal...
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
2025 AI/ML internship & new graduate job list updated daily
Easiest and laziest way for building multi-agent LLMs applications.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
A portable accelerated data query and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.
Open source project for data preparation of LLM application builders
🐵 Preswald is a full-stack platform for building, deploying, and managing interactive data applications. It brings ingestion, storage, transformation, and visualization into a simple SDK, minimizing ...
Manuscript is a revolutionary blockchain data streaming framework. With Manuscript, you can seamlessly integrate on-chain and off-chain data into target data storage for unrestricted querying and anal...
This Repository contains the real life use cases of GenAI (LLM+RAG) in Finance Domain. I covers many projects use cases with theory and projects.
2025 AI/ML internship & new graduate job list updated daily
AG Charts is a fully-featured and highly customizable JavaScript charting library. The professional choice for developers building enterprise applications
Open source project for data preparation of LLM application builders
pgCompare – a straightforward utility crafted to simplify the data comparison process, providing a robust solution for comparing data across various database platforms.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
An open source repo for data on the Pokemon TCG Cards
This is a repo with links to everything you'd ever want to learn about data engineering
Easiest and laziest way for building multi-agent LLMs applications.
🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.
RobustMQ is a next-generation, high-performance, cloud-native, converged message queue that is compatible with multiple mainstream message queuing protocols and has complete Serveless capabilities.
One downloader for many scientific data and code repositories! DOI :open_hands: Data
🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
A list of publicly available datasets with real-time data maintained by the team at bytewax.io
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
This is a repo with links to everything you'd ever want to learn about data engineering
🐵 Preswald is a full-stack platform for building, deploying, and managing interactive data applications. It brings ingestion, storage, transformation, and visualization into a simple SDK, minimizing ...
LlamaIndex is the leading framework for building LLM-powered agents over your data.
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
Manuscript is a revolutionary blockchain data streaming framework. With Manuscript, you can seamlessly integrate on-chain and off-chain data into target data storage for unrestricted querying and anal...
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
Easiest and laziest way for building multi-agent LLMs applications.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
2025 AI/ML internship & new graduate job list updated daily
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Manuscript is a revolutionary blockchain data streaming framework. With Manuscript, you can seamlessly integrate on-chain and off-chain data into target data storage for unrestricted querying and anal...
🐵 Preswald is a full-stack platform for building, deploying, and managing interactive data applications. It brings ingestion, storage, transformation, and visualization into a simple SDK, minimizing ...
This Repository contains the real life use cases of GenAI (LLM+RAG) in Finance Domain. I covers many projects use cases with theory and projects.
2025 AI/ML internship & new graduate job list updated daily
Xpert AI is an AI agents and data analysis platform for enterprises to make business decisions.
Easiest and laziest way for building multi-agent LLMs applications.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
AG Charts is a fully-featured and highly customizable JavaScript charting library. The professional choice for developers building enterprise applications
Open source project for data preparation of LLM application builders
RobustMQ is a next-generation, high-performance, cloud-native, converged message queue that is compatible with multiple mainstream message queuing protocols and has complete Serveless capabilities.
A collection of samples, best practices and reference architectures for implementing SaaS applications on AWS for databases and data services.
🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
This is a repo with links to everything you'd ever want to learn about data engineering
🐵 Preswald is a full-stack platform for building, deploying, and managing interactive data applications. It brings ingestion, storage, transformation, and visualization into a simple SDK, minimizing ...
LlamaIndex is the leading framework for building LLM-powered agents over your data.
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
A project providing a Graphic Walker Pane for use with HoloViz Panel.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Manuscript is a revolutionary blockchain data streaming framework. With Manuscript, you can seamlessly integrate on-chain and off-chain data into target data storage for unrestricted querying and anal...
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
2025 AI/ML internship & new graduate job list updated daily
An open source alternative to Tableau. Embeddable visual analytic
🐵 Preswald is a full-stack platform for building, deploying, and managing interactive data applications. It brings ingestion, storage, transformation, and visualization into a simple SDK, minimizing ...
A project providing a Graphic Walker Pane for use with HoloViz Panel.
Manuscript is a revolutionary blockchain data streaming framework. With Manuscript, you can seamlessly integrate on-chain and off-chain data into target data storage for unrestricted querying and anal...
Xpert AI is an AI agents and data analysis platform for enterprises to make business decisions.
This Repository contains the real life use cases of GenAI (LLM+RAG) in Finance Domain. I covers many projects use cases with theory and projects.
A free and open-source online course for anyone interested in data analysis and data visualization with JavaScript/TypeScript.
2025 AI/ML internship & new graduate job list updated daily
Fast, streaming indexing, query, and agentic LLM applications in Rust
AG Charts is a fully-featured and highly customizable JavaScript charting library. The professional choice for developers building enterprise applications
RobustMQ is a next-generation, high-performance, cloud-native, converged message queue that is compatible with multiple mainstream message queuing protocols and has complete Serveless capabilities.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
🕷️ Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Easiest and laziest way for building multi-agent LLMs applications.
🐵 Preswald is a full-stack platform for building, deploying, and managing interactive data applications. It brings ingestion, storage, transformation, and visualization into a simple SDK, minimizing ...
Powerful Analytics Solution. Setup in 30 seconds. Display all your data on a Simple, AI-powered dashboard. Fully self-hostable and GDPR compliant. Alternative to Google Analytics, MixPanel, Plausible,...
Open source project for data preparation of LLM application builders
Manuscript is a revolutionary blockchain data streaming framework. With Manuscript, you can seamlessly integrate on-chain and off-chain data into target data storage for unrestricted querying and anal...
A project providing a Graphic Walker Pane for use with HoloViz Panel.
Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
This is a repo with links to everything you'd ever want to learn about data engineering
LlamaIndex is the leading framework for building LLM-powered agents over your data.
A configuration as code language with rich validation and tooling.
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
🕷️ Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
🧙 Build, run, and manage data pipelines for integrating and transforming data.
LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
2025 AI/ML internship & new graduate job list updated daily
Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.
RobustMQ is a next-generation, high-performance, cloud-native, converged message queue that is compatible with multiple mainstream message queuing protocols and has complete Serveless capabilities.
The data-validation toolkit for enhanced dbt (data build tool) PR review
Scalable data pre processing and curation toolkit for LLMs
Light-weight, browser-based ROLAP pivot tables on top of DuckDB-WASM
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Scrapyman数据接口服务。提供:淘宝、小红书、京东、抖音(电商)、抖音(视频)、快手、蒲公英、星图、拼多多、微信公众号、大众点评、哔哩哔哩、知乎、微博、贝壳、Bigo、Temu、Lazada、Shopee、SHEIN、百度指数、携程、Boss直聘、智联招聘、拉钩、今日头条、Facebook、Youtube、Instgram、Twitter。爬虫、采集、scrapy、接口、API。
A fully-featured batteries-included Neovim distribution for the world of Data Science. Prepared to run code and interact with Jupyter Notebooks without ever leaving your terminal.
pgCompare – a straightforward utility crafted to simplify the data comparison process, providing a robust solution for comparing data across various database platforms.
🥪🦘 An open source sandbox project exploring dbt workflows via a fictional sandwich shop's data.
This repository contains the full dataset of AWS IAM data (services, actions, resource types and conditions keys). It's updated on a daily basis at 4AM UTC.