Trending repositories for topic data
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
This is a repo with links to everything you'd ever want to learn about data engineering
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Enterprise-grade toolkit for teams to continuously optimize compound AI systems, from pre to post-production
🚀 Glide Data Grid is a no compromise, outrageously react fast data grid with rich rendering, first class accessibility, and full TypeScript support.
LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
A self-hostable CDN for databases. Spice provides a unified SQL query interface and portable runtime to locally materialize, accelerate, and query datasets across databases, data warehouses, and data ...
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
Enterprise-grade toolkit for teams to continuously optimize compound AI systems, from pre to post-production
Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.
Fast, streaming indexing and query library for AI (RAG) applications, written in Rust
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
LBA tools(hd_write_verify & hd_write_verify_dump) are very useful for testing Storage stability and verifying DATA consistency, there are much better than FIO & vdbench's verifying functions. for exam...
Open Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).
Remove speed restrictions on your hotspot internet (iOS, iPadOS, Android, Quectel), and allows hotspots on any plan (rooted Android & Quectel only).
A list of publicly available datasets with real-time data maintained by the team at bytewax.io
A massive list including a huge amount of products and services that are completely free!
LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.
Generate trends for your models. Easily generate charts or reports.
Enterprise-grade toolkit for teams to continuously optimize compound AI systems, from pre to post-production
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
This is a repo with links to everything you'd ever want to learn about data engineering
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
🚀 Glide Data Grid is a no compromise, outrageously react fast data grid with rich rendering, first class accessibility, and full TypeScript support.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
Enterprise-grade toolkit for teams to continuously optimize compound AI systems, from pre to post-production
Fast, streaming indexing and query library for AI (RAG) applications, written in Rust
Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Scrapyman数据接口服务。提供:淘宝、小红书、京东、抖音(电商)、抖音(视频)、快手、蒲公英、星图、拼多多、微信公众号、大众点评、哔哩哔哩、知乎、微博、贝壳、Bigo、Temu、Lazada、Shopee、百度指数、携程、Boss直聘、智联招聘、拉钩、今日头条、Facebook、Youtube、Instgram、Twitter。爬虫、采集、scrapy、接口、API。
Remove speed restrictions on your hotspot internet (iOS, iPadOS, Android, Quectel), and allows hotspots on any plan (rooted Android & Quectel only).
LBA tools(hd_write_verify & hd_write_verify_dump) are very useful for testing Storage stability and verifying DATA consistency, there are much better than FIO & vdbench's verifying functions. for exam...
This repository is to show my Data Analytics & Engineering skills, share projects, and track my progress.
Bridge Four is a simple, functional, effectful, single-leader, multi worker, distributed compute system optimized for embarrassingly parallel workloads.
GeoNetwork UI is a suite of Applications made to provide a modern facade to your GeoNetwork 4 catalog. It also provides Web Components to embed various parts of your data catalog in third party websi...
Client-side only patch that allows you to unlock ALL cosmetics (+ emotes) in the Essential mod. Works on every version of Essential MC (1.8.9 - 1.20.6).
Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.
Enterprise-grade toolkit for teams to continuously optimize compound AI systems, from pre to post-production
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
This is a repo with links to everything you'd ever want to learn about data engineering
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs
🧙 Build, run, and manage data pipelines for integrating and transforming data.
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.
PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
Enterprise-grade toolkit for teams to continuously optimize compound AI systems, from pre to post-production
Command line interface for DuckDB, LibSQL, MariaDB, MySQL, PostgreSQL, Snowflake, SQLite3 and SQL Server
Fast, streaming indexing and query library for AI (RAG) applications, written in Rust
Open source project for data preparation of LLM application builders
A blazing-fast DuckDB wrapper built with the V language, making it easier to leverage its power in your projects.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
A DuckDB-powered command line interface for Snowflake security, governance, operations, and cost optimization.
Analytics for developers. Setup Analytics in 30 seconds with just one line of code. Display all your data on an AI-powered dashboard. Fully self-hostable and GDPR compliant.
Domain modeling with plain structs and type-safe relationships for smooth data handling.
This repository is to show my Data Analytics & Engineering skills, share projects, and track my progress.
Bridge Four is a simple, functional, effectful, single-leader, multi worker, distributed compute system optimized for embarrassingly parallel workloads.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
This is a repo with links to everything you'd ever want to learn about data engineering
LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
Python-based Low-code ETL for data manipulation and transformation. Generates Python code you can deploy anywhere.
Un repositorio más con conceptos básicos, desafíos técnicos y recursos sobre ingeniería de datos en español 🧙✨
Analytics for developers. Setup Analytics in 30 seconds with just one line of code. Display all your data on an AI-powered dashboard. Fully self-hostable and GDPR compliant.
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
The dbt data-validation toolkit for teams that care about building better data
A Collection of 10.000 collected Windows Chrome Fingerprints. Usable with an easy-to-use API, available as a compressed (lzma) or full-size Json (view Releases). Its just 1.4mb in size in compressed f...
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
A configuration as code language with rich validation and tooling.
This is a repo with links to everything you'd ever want to learn about data engineering
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
Superduper: Integrate AI models and machine learning workflows with your database to implement custom AI applications, without moving your data. Including streaming inference, scalable model hosting, ...
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.
Superduper: Integrate AI models and machine learning workflows with your database to implement custom AI applications, without moving your data. Including streaming inference, scalable model hosting, ...
The dbt data-validation toolkit for teams that care about building better data
🛠️ Tools for working with data effectively - data contracts using types, schemas, domain validation rules, type-safe casting, and more.
AG Charts is a fully-featured and highly customizable JavaScript charting library. The professional choice for developers building enterprise applications
Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.
A guide for technical professionals looking to start consulting
This is a repo with links to everything you'd ever want to learn about data engineering
🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
Scalable data pre processing and curation toolkit for LLMs
A fully-featured batteries-included Neovim distribution for the world of Data Science. Prepared to run code and interact with Jupyter Notebooks without ever leaving your terminal.
pgCompare – a straightforward utility crafted to simplify the data comparison process, providing a robust solution for comparing data across various database platforms.
A curated list of open source tools used in analytical stacks and data engineering ecosystem