PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
miso-belica
justext

Heuristic based boilerplate removal tool

6.1M 818 89
alphanome-ai
sec-parser

Parse SEC EDGAR HTML documents into a tree of elements that correspond to the visual (semantic) structure of the document.

92K 282 78
bug-ops
fast-scrape

🦀 High-performance HTML parsing library. Rust core with native bindings for Python, Node.js & WASM. SIMD-accelerated, memory-safe, consistent API everywhere.

11K 5 0
kata198
advancedhtmlparser

Fast Indexed python HTML parser which builds a DOM node tree, providing common getElementsBy* functions for scraping, testing, modification, and formatting. Also XPath.

8K 101 25
rajatomar788
pywebcopy

Locally saves webpages to your hard disk with images, css, js & links as is.

6K 640 117
imgurbot12
pyxml3

Pure python3 alternative to stdlib xml.etree with HTML support

4K 1 1
ispras
dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

2K 661 52
OwenOrcan
yirabot

YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, offering command-line ease and Python integration. Ideal for research, SEO, and data collection.

1K 17 0
Bystroushaak
pydhtmlparser

Lightweight HTML/XML parser for quick and dirty web scraping.

519 6 3
lexndru
hap

Hap! is an HTML parser and scraping tool.

302 1 0
yogendratamang48
parse-utils

Page Parser Utils For scraping, List index update

277 2 0
sihaelov
harser

Easy way for HTML parsing and building XPath

194 135 3
jet-logic
alterx

A powerful file processing toolkit for batch transformations of HTML, JSON, TOML, XML, and YAML files

170 0 0
esign-consulting
qarsmac

Dados de qualidade do ar coletados da Prefeitura do RJ - Secretaria Municipal de Meio Ambiente (SMAC).

164 0 0
luxcem
apifier

A web parser for tabular and/or paginated data

159 6 1
yannickperrenet
bookmarkdown

✅ Parse your browser's exported HTML bookmark file to Markdown.

143 18 0
vincentlaucsb
pgreaper

A Python library for loading data from various formats into PostgreSQL databases.

120 12 1
MaksimJames
pyhtmltext

Usefull tool for extracting text and sentences from html

116 1 0
yogendratamang48
parse-utils-yogen48

Page Parser Utils For scraping

115 2 0
kurtnettle
bubt-routinepy

An unofficial Python wrapper of the BUBT Routine API + a robust web scraper and PDF extractor for getting routine data.

77 0 0
invanatech
webpage-reader

Reads a webpage and extracts the information out of it, based on the HTML5 tags/classes

66 0 0
Anikeshpatel
dompy-parser

JavaScript Dom Api for Python, Html Parser and a Web scraping library

64 3 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery