Scraping Python Packages

trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

7.4M 6K 363

firecrawl-py

🔥 The API to search, scrape, and interact with the web for AI

7M 114K 7K

fake-useragent

Up-to-date simple useragent faker with real world database

6.5M 4K 537

parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

4.2M 1K 161

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

3.3M 62K 12K

apify-client

Apify API client for Python

1.9M 91 16

google-search-results

Google Search Results via SERP API pip Python Package

1.6M 736 121

browserforge

🎭 Intelligent browser header & fingerprint generator

1.5M 1K 84

apify-fingerprint-datapoints

Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.

1.2M 2K 194

undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

1.2M 13K 1K

camoufox

🦊 Anti-detect browser

914K 8K 679

requests-html

Pythonic HTML Parsing for Humans™

816K 328 42

firecrawl

🔥 The API to search, scrape, and interact with the web for AI

718K 114K 7K

fake-http-header

A python package to generate random request fields for a http header.

588K 44 2

scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

585K 47K 4K

crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

541K 9K 712

scrapegraph-py

Official Python SDK for the ScrapeGraph AI API. Smart scraping, search, crawling, markdownify, agentic browser automation, scheduled jobs, and structured data extraction

296K 71 14

apify

The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.

240K 167 23