Crawling Python Packages

courlan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

7.4M 169 13

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

3.3M 62K 12K

newspaper3k

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

1M 15K 2K

scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

612K 47K 4K

crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

541K 9K 712

scrapy-selenium

Scrapy middleware to handle javascript pages using selenium

26K 953 353

abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

22K 111 5