PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Web Scraping Python Packages

Python packages with the GitHub topic web-scraping. Sorted by relevance, with stars and monthly downloads.
lexiforest
curl-cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.

25.6M 6K 477
adbar
htmldate

Fast and robust date extraction from web pages, with Python or on the command-line

9.6M 148 30
deedy5
primp

HTTP client that can impersonate web browsers

9.1M 517 54
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

7.4M 6K 363
firecrawl
firecrawl-py

🔥 The API to search, scrape, and interact with the web for AI

7M 114K 7K
rushter
selectolax

Python binding to Modest and Lexbor engines. Fast HTML5 parser with CSS selectors for Python.

5.7M 2K 91
Kaliiiiiiiiii-Vinyzu
patchright

Undetected Python version of the Playwright testing and automation library.

4.9M 1K 96
seleniumbase
seleniumbase

APIs for browser automation, testing, and bypassing bot-detection.

3.5M 13K 2K
scrapy
scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

3.3M 62K 12K
serpapi
google-search-results

Google Search Results via SERP API pip Python Package

1.6M 736 121
firecrawl
firecrawl

🔥 The API to search, scrape, and interact with the web for AI

718K 114K 7K
D4Vinci
scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

585K 44K 4K
apify
crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

541K 9K 712
0x676e67
rnet

An ergonomic Python HTTP Client with TLS fingerprint

496K 1K 104
spider-rs
spider-client

Python, Javascript, and Rust libraries for the Spider Cloud API.

413K 25 9
scrapfly
scrapfly-sdk

Official Python SDK for the Scrapfly platform: web scraping, screenshots, AI extraction, crawling, and a remote anti-bot browser. Integrates with Scrapy, LlamaIndex, and LangChain.

313K 55 15
ScrapeGraphAI
scrapegraph-py

Official Python SDK for the ScrapeGraph AI API. Smart scraping, search, crawling, markdownify, agentic browser automation, scheduled jobs, and structured data extraction

296K 71 14
lit26
finvizfinance

Finviz analysis python library.

139K 1K 228
CloakHQ
cloakbrowser

Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with source-level fingerprint patches. 30/30 tests passed.

126K 2K 125
arman-bd
httpmorph

httpmorph is a drop-in replacement for Python's requests library that uses a custom C implementation with BoringSSL instead of Python's standard HTTP stack.

100K 145 3
rebrowser
rebrowser-playwright

A drop-in replacement for playwright-python patched with rebrowser-patches. It allows to pass modern automation detection tests.

78K 99 10
steel-dev
steel-sdk

The official Python library for the Steel API

72K 32 5
seleniumbase
pytest-sbase

APIs for browser automation, testing, and bypassing bot-detection.

70K 13K 2K
seleniumbase
sbase

APIs for browser automation, testing, and bypassing bot-detection.

62K 13K 2K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery