Web Scraping Python Packages

curl-cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.

25.6M 6K 477

htmldate

Fast and robust date extraction from web pages, with Python or on the command-line

9.6M 148 30

primp

HTTP client that can impersonate web browsers

9.1M 517 54

trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

7.4M 6K 363

firecrawl-py

🔥 The API to search, scrape, and interact with the web for AI

7M 114K 7K

selectolax

Python binding to Modest and Lexbor engines. Fast HTML5 parser with CSS selectors for Python.

5.7M 2K 91

patchright

Undetected Python version of the Playwright testing and automation library.

4.9M 1K 96

seleniumbase

APIs for browser automation, testing, and bypassing bot-detection.

3.5M 13K 2K

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

3.3M 62K 12K

google-search-results

Google Search Results via SERP API pip Python Package

1.6M 736 121

firecrawl

🔥 The API to search, scrape, and interact with the web for AI

718K 114K 7K

scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

585K 44K 4K

crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

541K 9K 712

rnet

An ergonomic Python HTTP Client with TLS fingerprint

496K 1K 104

spider-client

Python, Javascript, and Rust libraries for the Spider Cloud API.

413K 25 9

scrapfly-sdk

Official Python SDK for the Scrapfly platform: web scraping, screenshots, AI extraction, crawling, and a remote anti-bot browser. Integrates with Scrapy, LlamaIndex, and LangChain.

313K 55 15

scrapegraph-py

Official Python SDK for the ScrapeGraph AI API. Smart scraping, search, crawling, markdownify, agentic browser automation, scheduled jobs, and structured data extraction

296K 71 14

finvizfinance

Finviz analysis python library.

139K 1K 228

cloakbrowser

Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with source-level fingerprint patches. 30/30 tests passed.

126K 2K 125

httpmorph

httpmorph is a drop-in replacement for Python's requests library that uses a custom C implementation with BoringSSL instead of Python's standard HTTP stack.

100K 145 3

rebrowser-playwright

A drop-in replacement for playwright-python patched with rebrowser-patches. It allows to pass modern automation detection tests.

78K 99 10

steel-sdk

The official Python library for the Steel API

72K 32 5

pytest-sbase

APIs for browser automation, testing, and bypassing bot-detection.

70K 13K 2K

sbase

APIs for browser automation, testing, and bypassing bot-detection.

62K 13K 2K