18 results found Sort:

2.0k
25.9k
agpl-3.0
134
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Created 2024-02-29
2,330 commits to master branch, last one 4 days ago
Web Crawler/Spider for NodeJS + server-side jQuery ;-)
Created 2010-11-25
592 commits to master branch, last one 2 months ago
570
6.5k
agpl-3.0
66
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Created 2012-10-06
2,782 commits to main branch, last one 3 days ago
171
2.0k
mit
13
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Created 2021-06-21
11,770 commits to main branch, last one 14 hours ago
41
1.2k
other
10
Open-source platform for extracting structured data from documents using AI.
Created 2024-11-17
50 commits to main branch, last one 6 days ago
115
1.0k
apache-2.0
20
Crawly, a high-level web crawling & scraping framework for Elixir.
Created 2019-03-09
320 commits to master branch, last one 5 months ago
80
676
bsd-3-clause
24
Extract structured data from web sites. Web sites scraping.
Created 2017-02-09
885 commits to master branch, last one 4 years ago
A simple resume parser used for extracting information from resumes
Created 2018-12-11
52 commits to master branch, last one 3 years ago
8
95
unknown
2
Turn Webpage to LLM friendly input text. Similar to Firecrawl and Jina Reader API. Makes RAG, AI web scraping, image & webpage links extraction easy.
Created 2024-07-27
23 commits to main branch, last one 8 days ago
25
82
unknown
13
An R package for acquisition and processing of NASA SMAP data
Created 2016-05-11
304 commits to master branch, last one about a year ago
Library and cli for extracting data from HTML via CSS selectors
Created 2016-01-10
214 commits to master branch, last one 4 months ago
21
64
apache-2.0
8
FBLYZE is a Facebook scraping system and analysis system.
Created 2016-12-21
233 commits to master branch, last one 6 years ago
Get Lyrics for any songs by just passing in the song name (spelled or misspelled) in less than 2 seconds using this awesome Python Library.
Created 2019-01-14
20 commits to master branch, last one 4 years ago
Extracting and parsing structured data with jQuery Selector, XPath or JsonPath from common web format like HTML, XML and JSON.
Created 2015-12-25
198 commits to master branch, last one 2 years ago
This program extracts insider trading data from the sec website and stores it in excel file for the specified time frame.
Created 2021-01-08
11 commits to master branch, last one 2 years ago
14
43
gpl-3.0
3
Unofficial Python client for Twitter
This repository has been archived (exclude archived)
Created 2019-10-14
39 commits to master branch, last one 4 years ago
A tool to replace data in a Unity Asset Bundle from modified files.
Created 2021-05-24
96 commits to main branch, last one 2 years ago
2
41
unknown
3
Extract structured data from any unstructured web page
Created 2024-01-28
22 commits to main branch, last one 10 months ago