PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Metadata Extraction Python Packages

Python packages with the GitHub topic metadata-extraction. Sorted by relevance, with stars and monthly downloads.
adbar
htmldate

Fast and robust date extraction from web pages, with Python or on the command-line

9.8M 148 30
kreuzberg-dev
kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

167K 8K 472
oduwsdl
aiu

A library for interacting with web archive collections at Archive-It, Trove, Pandora, and more.

7K 8 1
iluvcapra
wavinfo

Probe WAVE Files for all metadata

6K 43 10
kobaltcore
pymage-size

A utility package for getting image dimensions without loading files into memory. No dependencies!

4K 16 1
jakiki6
ruminant

Recursive metadata extraction tool

2K 5 1
tern-tools
tern

Tern is a software composition analysis tool and Python library that generates a Software Bill of Materials for container images and Dockerfiles. The SBOM that Tern generates will give you a layer-by-layer view of what's inside your container in a variety of formats including human-readable, JSON, HTML, SPDX and more.

1K 1K 188
fvaleye
metadata-guardian

Provide an easy way with Python to protect your data sources by searching its metadata. 🛡️

934 18 1
radusuciu
traktor-nowplaying

traktor_nowplaying uses Traktor's broadcast functionality to extract metadata about the currently playing song.

851 66 8
lstein
photomapai

A modern image browser and search tool that uses AI to generate a "semantic map" of your collection.

837 70 4
DanTsai0903
namingpaper

CLI tool to rename academic papers using AI-extracted metadata

762 7 1
d3x-at
sd-parsers

A Python library to read metadata from images created by Stable Diffusion.

751 45 4
rsmvdl
metaspector

Python library to inspect and export metadata from MP4/M4V/M4A, MP3 and FLAC media files.

677 3 0
sdsc-ordes
gimie

Extract linked metadata from repositories

546 14 2
lttkgp
music-metadata-extractor

Extract song metadata from YouTube links with Spotify API

534 16 7
itsbigspark
pymetagen

Metadata Generator

494 0 0
VritraSecz
gitspyx

Advanced OSINT tool for GitHub reconnaissance — get profiles, repo insights & metadata instantly.

492 6 1
mauricelambert
spyware

This package implements a complete SpyWare.

426 154 32
baughmann
tikara

The metadata and text content extractor for almost every file type.

425 9 0
shantanubafna
geotcha

Extract and harmonize RNA-seq metadata from NCBI GEO

366 0 0
m8sec
pymetasec

Utility to download and extract document metadata from an organization. This technique can be used to identify: domains, usernames, software/version numbers and naming conventions.

366 513 88
meysam81
sitemap-harvester

Crawl sitemap of a given website and export metadata of its pages recursively into CSV format.

332 5 0
ankit-chaubey
surgery

Offline CLI tool for inspecting and modifying media metadata.

249 10 1
truethari
infomedia

Python application that can be used to retrieve media file information such as duration, frame rate, bit rate, etc..

214 7 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery