PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Html To Markdown Python Packages

Python packages with the GitHub topic html-to-markdown. Sorted by relevance, with stars and monthly downloads.
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

7.5M 6K 363
firecrawl
firecrawl-py

πŸ”₯ The API to search, scrape, and interact with the web for AI

7M 114K 7K
firecrawl
firecrawl

πŸ”₯ The API to search, scrape, and interact with the web for AI

746K 114K 7K
spider-rs
spider-client

Python, Javascript, and Rust libraries for the Spider Cloud API.

406K 25 9
Spenhouet
confluence-markdown-exporter

Export Atlassian Confluence pages as markdown files.

36K 396 104
tim-gromeyer
pyhtml2md

Transform your HTML into clean, easy-to-read markdown with html2md.

26K 81 11
us
crw

Fast, lightweight Firecrawl alternative in Rust. Web scraper, crawler & search API with MCP server for AI agents. Drop-in Firecrawl-compatible API (/v1/scrape, /v1/crawl, /v1/search). 2.3x faster than Tavily, 1.5x faster than Firecrawl in 1K-URL benchmarks. 6 MB RAM, single binary. Self-host or use managed cloud.

3K 71 5
pankaj28843
article-extractor

Pure-Python article extraction library and HTTP API - Extract clean content from web pages as Markdown or HTML

2K 0 0
muchdogesec
file2txt

file2txt is a Python library takes common file formats and turns them into plain text (a txt file) with Markdown styling.

1K 12 2
nanonets
llm-data-converter

Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract

1K 7 1
paulpierre
markdown-crawler

A multithreaded πŸ•ΈοΈ web crawler that recursively crawls a website and creates a πŸ”½ markdown file for each page, designed for LLM RAG

1K 441 53
QuartzUnit
markgrab

Universal web content extraction β€” URL to LLM-ready markdown

782 0 0
renesugar
html2txt

Convert HTML to markdown

348 1 2
nanonets
document-data-extractor

Convert any document format into LLM-ready data format (markdown) with advanced intelligent document processing capabilities powered by pre-trained models.

269 7 1
mazzasaverio
url2md4ai

Lean Python tool for extracting clean, LLM-optimized markdown from web pages. Handles dynamic content with Playwright + Trafilatura for maximum information extraction efficiency.

252 4 0
spider-rs
spiderwebai-py

Python, Javascript, and Rust libraries for the Spider Cloud API.

200 25 9
yannickperrenet
bookmarkdown

βœ… Parse your browser's exported HTML bookmark file to Markdown.

151 18 0
spider-rs
spiderclient-py

Python, Javascript, and Rust libraries for the Spider Cloud API.

31 25 9
spider-rs
spidercloud-py

Python, Javascript, and Rust libraries for the Spider Cloud API.

31 25 9
trubitsyn
bookmarks2markdown

Convert bookmarks to Markdown

3 5 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery