PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
adbar
courlan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

7.1M 169 13
scrapy
scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

3.4M 62K 12K
codelucas
newspaper3k

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

999K 15K 2K
D4Vinci
scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

564K 44K 4K
apify
crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

538K 9K 712
clemfromspace
scrapy-selenium

Scrapy middleware to handle javascript pages using selenium

26K 953 353
ArchiveBox
abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

25K 111 5
scrapinghub
spidermon

Scrapy Extension for monitoring spiders execution.

21K 555 103
kreuzberg-dev
kreuzcrawl

High-performance web crawling engine with bindings for 11 languages

14K 84 10
dsdanielpark
arxiv2text

Converting PDF files to text, mainly with a focus on arXiv papers.

13K 24 2
lorien
grab

Web Scraping Framework

6K 2K 278
alephdata
memorious

Lightweight web scraping toolkit for documents and structured data.

4K 315 64
codelucas
newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

3K 15K 2K
scrapinghub
scrapyrt

HTTP API for Scrapy spiders

3K 880 162
crawlbase-source
crawlbase

Fast python library for the Crawlbase API

2K 25 2
ihandmine
aioscpy

An asyncio + aiolibs crawler imitate scrapy framework

2K 115 10
Abdulrahman-Elsmmany
docscrape

Scrape any documentation site to Markdown in seconds

1K 0 0
proxycrawl
proxycrawl

ProxyCrawl Python library for scraping and crawling

1K 58 19
bluet
proxybroker2

The New (auto rotate) Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS :performing_arts:

1K 992 136
NationalLibraryOfNorway
maalfrid-toolkit

Toolkit for the Målfrid project

1K 2 1
iawia002
lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

1K 805 140
zaidkx37
shopscout

Scrape any Shopify store - products, collections, pages & metadata from the public JSON API. No API key needed. SDK + CLI + REST API.

1K 1 0
lorey
mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples

1K 1K 92
rivermont
spidy-web-crawler

Spidy is the simple, easy to use command line web crawler.

734 352 69
    • Data from PyPI, GitHub, ClickHouse, and BigQuery