PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Crawling Python Packages

Python packages with the GitHub topic crawling. Sorted by relevance, with stars and monthly downloads.
adbar
courlan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

7.4M 169 13
scrapy
scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

3.3M 62K 12K
codelucas
newspaper3k

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

1M 15K 2K
D4Vinci
scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

612K 47K 4K
apify
crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

541K 9K 712
clemfromspace
scrapy-selenium

Scrapy middleware to handle javascript pages using selenium

26K 953 353
ArchiveBox
abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

22K 111 5
scrapinghub
spidermon

Scrapy Extension for monitoring spiders execution.

21K 555 103
kreuzberg-dev
kreuzcrawl

High-performance web crawling engine with bindings for 11 languages

15K 84 10
dsdanielpark
arxiv2text

Converting PDF files to text, mainly with a focus on arXiv papers.

13K 24 2
lorien
grab

Web Scraping Framework

6K 2K 278
alephdata
memorious

Lightweight web scraping toolkit for documents and structured data.

4K 315 64
codelucas
newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

3K 15K 2K
scrapinghub
scrapyrt

HTTP API for Scrapy spiders

3K 881 162
crawlbase-source
crawlbase

Fast python library for the Crawlbase API

2K 25 2
ihandmine
aioscpy

An asyncio + aiolibs crawler imitate scrapy framework

2K 115 10
proxycrawl
proxycrawl

ProxyCrawl Python library for scraping and crawling

2K 58 19
Abdulrahman-Elsmmany
docscrape

Scrape any documentation site to Markdown in seconds

1K 0 0
lorey
mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples

1K 1K 92
bluet
proxybroker2

The New (auto rotate) Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS :performing_arts:

1K 992 136
NationalLibraryOfNorway
maalfrid-toolkit

Toolkit for the Målfrid project

1K 2 1
zaidkx37
shopscout

Scrape any Shopify store - products, collections, pages & metadata from the public JSON API. No API key needed. SDK + CLI + REST API.

1K 1 0
iawia002
lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

1K 805 140
proxymesh
scrapy-proxy-headers

Handle custom proxy headers when making HTTPS requests through proxies in scrapy

744 4 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery