PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

7.2M 6K 363
myifeng
article-parser

Extract article or news by url or html, parse the title and content, output in markdown format.

824 50 6
opendatalab
mineru-html

MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.

626 239 25
artiomn
markdown-tool

Parse markdown article, download images and replace images URL's with local paths

302 127 27
rexdivakar
llmparser

Turn any website into clean, structured content that language models can actually read.

101 2 0
arachnio
arachnio

Client library for interacting with Arachnio API

89 0 0
johnbumgarner
newshound

This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.

86 34 3
    • Data from PyPI, GitHub, ClickHouse, and BigQuery