124 dependents
Package Description Downloads/month
Parsel lets you extract data from XML/HTML documents using XPath or CSS selector... 4.2M
Scrapy, a fast high-level web crawling & scraping framework for Python. 3.4M
Pythonic HTML Parsing for Humans™ 815K
🕷️ An adaptive Web Scraping framework that handles everything from a single requ... 564K
Extract embedded metadata from HTML markup 341K
Simplified python article discovery & extraction. 319K
Python client for Zyte API 218K
A service daemon to run Scrapy spiders 47K
Web scraping Page Objects core library 40K
Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy 22K
Capture a URL with Playwright 20K
Parsing of data from web pages. 19K
Formasaurus tells you the type of an HTML form and its fields using machine lear... 9K
Compress local documentation context for coding agents. 8K
Make a tree from a HAR file 7K
Python common code for Directory API clients. 6K
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Po... 6K
A command line application for downloading scientific papers. 5K
Requests-XML: XML Parsing for Humans 4K
Discarding duplicate URLs based on rules. 4K
Build HTTP requests out of HTML forms 3K
A scalable frontier for web crawlers 3K
基于 asyncio 的高性能异步分布式爬虫框架,支持单机和分布式部署 3K
Implement scrapy with asyncio 3K
A simple Python script to interact with Netcraft APIs from CLI 2K
More flexible and featured Frontera scheduler for Scrapy 2K
Board games data scraping and processing from BoardGameGeek and more! 2K
Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Aut... 2K
A cli tool for download subtitle from www.subdivx.com with the better possible m... 2K
A simple Python script to interact with CRDF (crdf.fr) APIs from CLI. 2K
convenience method for parsing html to lxml elementtree using sane character dec... 2K
ML container made simple 2K
An AI-powered console assistant with a versatile API for seamless integration in... 2K
Decoupled control plane for AI agents 2K
A high-speed web spider for massive scraping. 2K
Search sites for RSS, Atom, and JSON feeds 2K
Standalone remote executor for Cognis 2K
Import Evernote ENEX files to Notion 2K
基于 asyncio 的高性能异步爬虫框架,支持 MySQL/MongoDB/Redis 等多种数据存储 2K
快速构建你的爬虫或者其他项目 1K
A web application component that provides a faceted search interface for bibliog... 1K
Strwythura: construct an entity-resolved knowledge graph from structured data so... 1K
Scrapy, a fast high-level web crawling & scraping framework for Python. 1K
Scrapydd is a system for scrapy spiders distributed running and scheduleing syst... 1K
Fork of requests-html, powered by playwright 984
Requests-HTML(with microsoft/playwright-python): HTML Parsing for Humans™ 906
The official Python (3) library for the Steem Blockchain. 891
A pure-python HTML screen-scraping library 841
Songbird's cli. 832
Provides an SqlAlchemy based cache storage backend, a Selenium middleware, and a... 832