PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
J535D165
recordlinkage

A powerful and modular toolkit for record linkage and duplicate detection in Python

4.6M 1K 153
dedupeio
dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

99K 4K 569
mighty-justice
django-super-deduper

Utilities for de-duping Django model instances

5K 32 9
zinggAI
zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

4K 1K 165
kdeldycke
mail-deduplicate

📧 CLI to deduplicate mails from mail boxes

1K 196 42
knjcode
imgdupes

Identifying and removing near-duplicate images using perceptual hashing.

611 389 24
kdeldycke
maildir-deduplicate

Deduplicate mails from a set of maildir folders.

595 196 42
dedupeio
dedupe-fork-eccovia

A python library for accurate and scaleable data deduplication and entity-resolution

384 4K 569
dssg
superdeduper

A simple interface to datamade/dedupe to make probabilistic record linkage easy.

258 43 5
laktak
chkbit

Check your files for data corruption and run quick file deduplication

255 175 13
yugn
yadupe

Recursively scan one or more given directories for duplicate files.

172 0 1
chansooligans
oagdedupe

oagdedupe is a Python library for scalable entity resolution, using active learning to learn blocking configurations, generate comparison pairs, then clasify matches.

156 2 1
dssg
pgdedupe

A simple command line interface to the datamade/dedupe library.

96 43 5
dedupeio
dedupe-fh

A python library for accurate and scaleable data deduplication and entity-resolution

71 4K 569
    • Data from PyPI, GitHub, ClickHouse, and BigQuery