PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Extract Data Python Packages

Python packages with the GitHub topic extract-data. Sorted by relevance, with stars and monthly downloads.
pymupdf
pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

78.7M 10K 718
pymupdf
pymupdfb

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

4.4M 10K 718
meltano
meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

1.1M 2K 235
opendatalab
mineru

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

283K 62K 5K
opendatalab
magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

76K 62K 5K
yuanxu-li
html-table-extractor

extract data from html table

27K 88 22
opendatalab
mineru-selfhosted-mcp

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

4K 62K 5K
ayush571995
extract-zip

Extract all files within a zip file which can also be in a zip format by simply running this script

2K 0 0
umLu
tubeframes

A Python package for retrieving YouTube data, including video statistics, captions, and channel information. TubeFrames outputs results in a user-friendly pandas DataFrame format, making it ideal for data analysis workflows — especially in Jupyter Notebooks.

2K 3 0
MeltanoLabs
tap-dbt

Singer tap for dbt, built with the Singer SDK.

1K 12 8
AdemBoukhris457
doctra

Parse, extract, and analyze documents with ease

1K 204 33
MeltanoLabs
tap-stackexchange

Singer tap for StackExchange, built with the Meltano SDK for Singer Taps.

784 3 1
usercando
pullcite

Evidence-backed structured extraction. Pull data from documents with proof of where each value came from.

742 1 0
Techcatchers
lyrics-extractor

Get Lyrics for any songs by just passing in the song name (spelled or misspelled) in less than 2 seconds using this awesome Python Library.

554 60 18
rodricios
wxpath

wxpath - a declarative web crawler and data extractor

442 110 5
apurvasijaria
googleplaystorescrape

Python module to extract Google Play store reviews and other information of any android app.

362 4 0
brunneis
bluebird

Unofficial Python client for Twitter

261 44 14
ammaryasirnaich
pyreqify

A module to extract Python dependencies packages from .py and .ipynb

257 0 0
izikeros
todo-extract

Script for extracting TODO notes from the text file

235 2 0
opendatalab
xh-pdf-parser

A practical tool for converting PDF to Markdown

184 62K 5K
pymupdf
aqpymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

162 10K 718
KhalidCK
db2table

Convert a sqlite db file to a simple web friendly format

131 0 1
brunneis
polypus

Unofficial Python client for Twitter

89 44 14
opendatalab
lazyllm-magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

81 62K 5K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery