Dependents of w3lib - PyPI Stats

124 dependents

Package	Description	Downloads/month
parsel	Parsel lets you extract data from XML/HTML documents using XPath or CSS selector...	4.2M
scrapy	Scrapy, a fast high-level web crawling & scraping framework for Python.	3.4M
requests-html	Pythonic HTML Parsing for Humans™	815K
scrapling	🕷️ An adaptive Web Scraping framework that handles everything from a single requ...	564K
extruct	Extract embedded metadata from HTML markup	341K
newspaper4k	Simplified python article discovery & extraction.	319K
zyte-api	Python client for Zyte API	218K
scrapyd	A service daemon to run Scrapy spiders	47K
web-poet	Web scraping Page Objects core library	40K
scrapy-zyte-smartproxy	Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy	22K
playwrightcapture	Capture a URL with Playwright	20K
zyte-parsers	Parsing of data from web pages.	19K
formasaurus	Formasaurus tells you the type of an HTML form and its fields using machine lear...	9K
docmancer	Compress local documentation context for coding agents.	8K
har2tree	Make a tree from a HAR file	7K
directory-client-core	Python common code for Directory API clients.	6K
archivebox	🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Po...	6K
papers-dl	A command line application for downloading scientific papers.	5K
requests-xml	Requests-XML: XML Parsing for Humans	4K
duplicate-url-discarder	Discarding duplicate URLs based on rules.	4K
form2request	Build HTTP requests out of HTML forms	3K
frontera	A scalable frontier for web crawlers	3K
crawlo	基于 asyncio 的高性能异步分布式爬虫框架，支持单机和分布式部署	3K
aio-scrapy	Implement scrapy with asyncio	3K
pynetcraftcli	A simple Python script to interact with Netcraft APIs from CLI	2K
scrapy-frontera	More flexible and featured Frontera scheduler for Scrapy	2K
board-game-scraper	Board games data scraping and processing from BoardGameGeek and more!	2K
scrapydweb	Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Aut...	2K
subdx-dl	A cli tool for download subtitle from www.subdivx.com with the better possible m...	2K
pycrdfcli	A simple Python script to interact with CRDF (crdf.fr) APIs from CLI.	2K
html-to-etree	convenience method for parsing html to lxml elementtree using sane character dec...	2K
tungstenkit	ML container made simple	2K
bondai	An AI-powered console assistant with a versatile API for seamless integration in...	2K
cognis-controller	Decoupled control plane for AI agents	2K
ispider	A high-speed web spider for massive scraping.	2K
feedsearch-crawler	Search sites for RSS, Atom, and JSON feeds	2K
cognis-executor	Standalone remote executor for Cognis	2K
enex2notion	Import Evernote ENEX files to Notion	2K
aiospider-tarkin	基于 asyncio 的高性能异步爬虫框架，支持 MySQL/MongoDB/Redis 等多种数据存储	2K
bricks-py	快速构建你的爬虫或者其他项目	1K
kerko	A web application component that provides a faceted search interface for bibliog...	1K
strwythura	Strwythura: construct an entity-resolved knowledge graph from structured data so...	1K
scrapy3	Scrapy, a fast high-level web crawling & scraping framework for Python.	1K
scrapydd	Scrapydd is a system for scrapy spiders distributed running and scheduleing syst...	1K
requests-htmlc	Fork of requests-html, powered by playwright	984
requests-html-playwright	Requests-HTML(with microsoft/playwright-python): HTML Parsing for Humans™	906
steem	The official Python (3) library for the Steem Blockchain.	891
sd-scrapely	A pure-python HTML screen-scraping library	841
songbirdcli	Songbird's cli.	832
scrachy	Provides an SqlAlchemy based cache storage backend, a Selenium middleware, and a...	832