PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Scraping Python Packages

Python packages with the GitHub topic scraping. Sorted by relevance, with stars and monthly downloads.
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

7.4M 6K 363
firecrawl
firecrawl-py

🔥 The API to search, scrape, and interact with the web for AI

7M 114K 7K
fake-useragent
fake-useragent

Up-to-date simple useragent faker with real world database

6.5M 4K 537
scrapy
parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

4.2M 1K 161
scrapy
scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

3.3M 62K 12K
apify
apify-client

Apify API client for Python

1.9M 91 16
serpapi
google-search-results

Google Search Results via SERP API pip Python Package

1.6M 736 121
daijro
browserforge

🎭 Intelligent browser header & fingerprint generator

1.5M 1K 84
apify
apify-fingerprint-datapoints

Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.

1.2M 2K 194
ultrafunkamsterdam
undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

1.2M 13K 1K
daijro
camoufox

🦊 Anti-detect browser

914K 8K 679
kennethreitz
requests-html

Pythonic HTML Parsing for Humans™

816K 328 42
firecrawl
firecrawl

🔥 The API to search, scrape, and interact with the web for AI

718K 114K 7K
MichaelTatarski
fake-http-header

A python package to generate random request fields for a http header.

588K 44 2
D4Vinci
scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

585K 47K 4K
apify
crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

541K 9K 712
ScrapeGraphAI
scrapegraph-py

Official Python SDK for the ScrapeGraph AI API. Smart scraping, search, crawling, markdownify, agentic browser automation, scheduled jobs, and structured data extraction

296K 71 14
apify
apify

The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.

240K 167 23
d60
twikit

Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot

162K 4K 526
ZenRows
zenrows

SDK to access ZenRows API directly from Python. We handle proxies rotation, headless browsers and CAPTCHAs for you.

160K 18 9
simonw
shot-scraper

A command-line utility for taking automated screenshots of websites

115K 2K 115
scrapy-plugins
scrapy-zyte-api

Zyte API integration for Scrapy

110K 40 21
soxoj
maigret

🕵️‍♂️ Collect a dossier on a person by username from 3000+ sites

87K 23K 2K
rebrowser
rebrowser-playwright

A drop-in replacement for playwright-python patched with rebrowser-patches. It allows to pass modern automation detection tests.

78K 99 10
    • Data from PyPI, GitHub, ClickHouse, and BigQuery