PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
firecrawl
firecrawl-py

🔥 The API to search, scrape, and interact with the web for AI

6.8M 114K 7K
vi3k6i5
flashtext

Extract Keywords from sentence or Replace keywords in sentences.

2.3M 6K 598
firecrawl
firecrawl

🔥 The API to search, scrape, and interact with the web for AI

677K 114K 7K
thinh-vu
vnstock

A beginner-friendly yet powerful Python toolkit for financial analysis and automation — built to make modern investing accessible to everyone

601K 1K 275
D4Vinci
scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

564K 44K 4K
scrapfly
scrapfly-sdk

Official Python SDK for the Scrapfly platform: web scraping, screenshots, AI extraction, crawling, and a remote anti-bot browser. Integrates with Scrapy, LlamaIndex, and LangChain.

313K 55 15
hhursev
recipe-scrapers

Python package for scraping recipes data

85K 2K 643
yfedoseev
pdf-oxide

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

81K 717 78
a-maliarov
amazoncaptcha

Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.

71K 490 91
jpjacobpadilla
stealth-requests

Undetected web-scraping & seamless HTML parsing in Python!

58K 467 48
linw1995
jsonpath-extractor

A query expression for extracting data from JSON.

17K 41 4
shcherbak-ai
contextgem

ContextGem: Effortless LLM extraction from documents

9K 2K 155
AIMLPM
markcrawl

Fast Python web crawler for RAG and AI ingestion. Extracts clean Markdown from any site for LLMs and vector stores.

8K 2 0
thinh-vu
vnstock3

A beginner-friendly yet powerful Python toolkit for financial analysis and automation — built to make modern investing accessible to everyone

7K 1K 275
nppoly
cyac

High performance Trie and Ahocorasick automata (AC automata) Keyword Match & Replace Tool for python. Correct case insensitive implementation!

7K 94 15
ironmussa
optimuspyspark

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

6K 2K 232
html-extract
hext

A module and command-line utility to extract structured data from HTML

5K 55 3
aborruso
scrape-cli

Extract HTML elements from the command line using CSS selectors or XPath. Pipe-friendly Python CLI.

5K 26 1
StabRise
scaledp

ScaleDP is an Open-Source extension of Apache Spark for Document Processing

5K 18 1
us
crw

Fast, lightweight Firecrawl alternative in Rust. Web scraper, crawler & search API with MCP server for AI agents. Drop-in Firecrawl-compatible API (/v1/scrape, /v1/crawl, /v1/search). 2.3x faster than Tavily, 1.5x faster than Firecrawl in 1K-URL benchmarks. 6 MB RAM, single binary. Self-host or use managed cloud.

3K 71 5
gambolputty
wiktionary-de-parser

Extracts data from German Wiktionary dump files.

3K 26 8
nordinz7
maybankpdf2json

A package for extracting JSON data from Maybank PDF account statements

2K 1 0
kaya70875
ytfetcher

⚡ Build structured YouTube datasets at scale — effortlessly fetch transcripts and rich metadata for NLP, ML, and AI workflows.

2K 70 11
omniologynow-rgb
scout-intel-mcp

The Google for AI agents — ask Claude to research any company, analyze competitors, track market trends, and score data quality. 6 intelligence tools, 5+ data sources (DuckDuckGo, NewsAPI, Wikipedia, web scraping), confidence-scored structured JSON. pip install scout-intel-mcp

2K 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery