PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
akamhy
waybackpy

Wayback Machine API interface & a command-line tool

2.6M 575 41
webrecorder
warcio

Streaming WARC/ARC library for fast web archive IO

1.2M 456 69
webrecorder
pywb

Core Python Web Archiving Toolkit for replay and recording of web archives

11K 2K 239
oduwsdl
ipwb

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS

7K 650 41
ArchiveBox
archivebox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

6K 27K 2K
bellingcat
auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

5K 1K 100
webrecorder
cdxj-indexer

CDXJ Indexing of WARC/ARCs

4K 34 15
cocrawler
cdx-toolkit

A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine

4K 204 34
GeiserX
wayback-archive

A comprehensive tool for downloading and archiving websites from the Wayback Machine

2K 8 3
eliask
farchive

Local content-addressed archive with observation history. Stores bytes by SHA-256, preserves locator state as contiguous spans, compresses with zstd and corpus-trained dictionaries. SQLite-backed.

885 6 0
GeiserX
wayback-diff

Intelligent web page comparison tool with Wayback Machine support and visual regression testing

716 1 0
Own-Data-Privateer
hoardy-web

Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, replay, mirroring, data scraping, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.

599 120 10
internetarchive
fatcat-openapi-client

API client library for fatcat.wiki (a bibliographic catalog)

593 121 18
caltechlibrary
eprints2archives

Send EPrints URLs to the Internet Archive and other archives

398 4 0
internetarchive
scrapy-warcio

Support for writing WARC files with Scrapy

385 24 6
Florents-Tselai
warcdb

WarcDB: Web crawl data as SQLite databases

381 404 10
Own-Data-Privateer
hoardy-web-sas

Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, replay, mirroring, data scraping, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.

204 120 10
ArchiveBox
archivebox-likn

The self-hosted internet archive.

132 27K 2K
ikreymer
pywayback

Core Python Web Archiving Toolkit for replay and recording of web archives

1 2K 239
    • Data from PyPI, GitHub, ClickHouse, and BigQuery