PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
pymupdf
pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

77.9M 10K 718
pymupdf
pymupdfb

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

4.3M 10K 718
meltano
meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

1.2M 2K 235
opendatalab
mineru

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

282K 62K 5K
opendatalab
magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

77K 62K 5K
yuanxu-li
html-table-extractor

extract data from html table

28K 88 22
opendatalab
mineru-selfhosted-mcp

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

3K 62K 5K
ayush571995
extract-zip

Extract all files within a zip file which can also be in a zip format by simply running this script

2K 0 0
umLu
tubeframes

A Python package for retrieving YouTube data, including video statistics, captions, and channel information. TubeFrames outputs results in a user-friendly pandas DataFrame format, making it ideal for data analysis workflows — especially in Jupyter Notebooks.

2K 3 0
MeltanoLabs
tap-dbt

Singer tap for dbt, built with the Singer SDK.

1K 12 8
AdemBoukhris457
doctra

Parse, extract, and analyze documents with ease

1K 204 33
MeltanoLabs
tap-stackexchange

Singer tap for StackExchange, built with the Meltano SDK for Singer Taps.

774 3 1
usercando
pullcite

Evidence-backed structured extraction. Pull data from documents with proof of where each value came from.

705 1 0
Techcatchers
lyrics-extractor

Get Lyrics for any songs by just passing in the song name (spelled or misspelled) in less than 2 seconds using this awesome Python Library.

543 60 18
rodricios
wxpath

wxpath - a declarative web crawler and data extractor

380 110 5
apurvasijaria
googleplaystorescrape

Python module to extract Google Play store reviews and other information of any android app.

355 4 0
brunneis
bluebird

Unofficial Python client for Twitter

241 44 14
ammaryasirnaich
pyreqify

A module to extract Python dependencies packages from .py and .ipynb

227 0 0
izikeros
todo-extract

Script for extracting TODO notes from the text file

223 2 0
opendatalab
xh-pdf-parser

A practical tool for converting PDF to Markdown

167 62K 5K
pymupdf
aqpymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

161 10K 718
KhalidCK
db2table

Convert a sqlite db file to a simple web friendly format

124 0 1
brunneis
polypus

Social Media scraping with Python

89 44 14
SpaceShaman
deckard

Extract structured data from unstructured text — no AI, just regular expressions. 🔍

71 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery