Crawler Python Packages

trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

7.4M 6K 363

courlan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

7.3M 169 13

firecrawl-py

🔥 The API to search, scrape, and interact with the web for AI

7M 114K 7K

selectolax

Python binding to Modest and Lexbor engines. Fast HTML5 parser with CSS selectors for Python.

5.7M 2K 91

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

3.3M 62K 12K

newspaper3k

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

1M 15K 2K

firecrawl

🔥 The API to search, scrape, and interact with the web for AI

718K 114K 7K

scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

585K 44K 4K

google-play-scraper

Google play scraper for Python inspired by <facundoolano/google-play-scraper>

583K 967 246

spotifyscraper

Spotify Scraper to extract all the information from spotify, download mp3 with cover of the song

544K 252 28

crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

541K 9K 712