124 dependents
| Package | Description | Downloads/month |
|---|---|---|
| Parsel lets you extract data from XML/HTML documents using XPath or CSS selector... | 4.2M | |
| Scrapy, a fast high-level web crawling & scraping framework for Python. | 3.4M | |
| Pythonic HTML Parsing for Humans™ | 815K | |
| 🕷️ An adaptive Web Scraping framework that handles everything from a single requ... | 564K | |
| Extract embedded metadata from HTML markup | 341K | |
| Simplified python article discovery & extraction. | 319K | |
| Python client for Zyte API | 218K | |
| A service daemon to run Scrapy spiders | 47K | |
| Web scraping Page Objects core library | 40K | |
| Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy | 22K | |
| Capture a URL with Playwright | 20K | |
| Parsing of data from web pages. | 19K | |
| Formasaurus tells you the type of an HTML form and its fields using machine lear... | 9K | |
| Compress local documentation context for coding agents. | 8K | |
| Make a tree from a HAR file | 7K | |
| Python common code for Directory API clients. | 6K | |
| 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Po... | 6K | |
| A command line application for downloading scientific papers. | 5K | |
| Requests-XML: XML Parsing for Humans | 4K | |
| Discarding duplicate URLs based on rules. | 4K | |
| Build HTTP requests out of HTML forms | 3K | |
| A scalable frontier for web crawlers | 3K | |
| 基于 asyncio 的高性能异步分布式爬虫框架,支持单机和分布式部署 | 3K | |
| Implement scrapy with asyncio | 3K | |
| A simple Python script to interact with Netcraft APIs from CLI | 2K | |
| More flexible and featured Frontera scheduler for Scrapy | 2K | |
| Board games data scraping and processing from BoardGameGeek and more! | 2K | |
| Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Aut... | 2K | |
| A cli tool for download subtitle from www.subdivx.com with the better possible m... | 2K | |
| A simple Python script to interact with CRDF (crdf.fr) APIs from CLI. | 2K | |
| convenience method for parsing html to lxml elementtree using sane character dec... | 2K | |
| ML container made simple | 2K | |
| An AI-powered console assistant with a versatile API for seamless integration in... | 2K | |
| Decoupled control plane for AI agents | 2K | |
| A high-speed web spider for massive scraping. | 2K | |
| Search sites for RSS, Atom, and JSON feeds | 2K | |
| Standalone remote executor for Cognis | 2K | |
| Import Evernote ENEX files to Notion | 2K | |
| 基于 asyncio 的高性能异步爬虫框架,支持 MySQL/MongoDB/Redis 等多种数据存储 | 2K | |
| 快速构建你的爬虫或者其他项目 | 1K | |
| A web application component that provides a faceted search interface for bibliog... | 1K | |
| Strwythura: construct an entity-resolved knowledge graph from structured data so... | 1K | |
| Scrapy, a fast high-level web crawling & scraping framework for Python. | 1K | |
| Scrapydd is a system for scrapy spiders distributed running and scheduleing syst... | 1K | |
| Fork of requests-html, powered by playwright | 984 | |
| Requests-HTML(with microsoft/playwright-python): HTML Parsing for Humans™ | 906 | |
| The official Python (3) library for the Steem Blockchain. | 891 | |
| A pure-python HTML screen-scraping library | 841 | |
| Songbird's cli. | 832 | |
| Provides an SqlAlchemy based cache storage backend, a Selenium middleware, and a... | 832 |