PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Crawler Python Packages

Python packages with the GitHub topic crawler. Sorted by relevance, with stars and monthly downloads.
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

7.4M 6K 363
adbar
courlan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

7.3M 169 13
firecrawl
firecrawl-py

🔥 The API to search, scrape, and interact with the web for AI

7M 114K 7K
rushter
selectolax

Python binding to Modest and Lexbor engines. Fast HTML5 parser with CSS selectors for Python.

5.7M 2K 91
scrapy
scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

3.3M 62K 12K
codelucas
newspaper3k

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

1M 15K 2K
firecrawl
firecrawl

🔥 The API to search, scrape, and interact with the web for AI

718K 114K 7K
D4Vinci
scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

585K 44K 4K
JoMingyu
google-play-scraper

Google play scraper for Python inspired by <facundoolano/google-play-scraper>

583K 967 246
AliAkhtari78
spotifyscraper

Spotify Scraper to extract all the information from spotify, download mp3 with cover of the song

544K 252 28
apify
crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

541K 9K 712
0x676e67
rnet

An ergonomic Python HTTP Client with TLS fingerprint

496K 1K 104
Luqman-Ud-Din
random-user-agent

A package to get list of user agents based on filters such as operating system, software name etc..

428K 103 12
spider-rs
spider-client

Python, Javascript, and Rust libraries for the Spider Cloud API.

413K 25 9
moskrc
crawlerdetect

🕷CrawlerDetect is a Python library designed to identify bots, crawlers, and spiders by analyzing their user agents.

152K 44 11
fhamborg
news-please

news-please - an integrated web crawler and information extractor for news that just works

112K 2K 452
scrapy-plugins
scrapy-zyte-api

Zyte API integration for Scrapy

110K 40 21
jpramosi
geckordp

A client implementation of Firefox DevTools over remote debug protocol in python

96K 38 14
hellock
icrawler

A multi-thread crawler framework with many builtin image crawlers provided.

73K 921 179
hect0x7
jmcomic

Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀

44K 6K 10K
rmax
scrapy-redis

Redis-based components for Scrapy.

36K 6K 2K
scrapy-plugins
scrapy-crawlera

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy

33K 365 91
0x676e67
wreq

An ergonomic Python HTTP Client with TLS fingerprint

31K 1K 104
tn3w
is-crawler

Crawler detection from User-Agent strings in 50 ns. Issues and pull requests welcome!

29K 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery