PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
J535D165
recordlinkage

A powerful and modular toolkit for record linkage and duplicate detection in Python

4.6M 1K 153
moj-analytical-services
splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

718K 2K 234
dedupeio
dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

99K 4K 569
maxharlow
csvmatch

🔎 Finds fuzzy matches between CSV files

8K 191 21
data61
anonlink

Python implementation of anonymous linkage using cryptographic linkage keys

7K 74 8
benzsevern
goldenmatch

🟡 Golden Suite — polyglot data-quality + entity-resolution toolkit. GoldenCheck profiles → GoldenFlow standardizes → GoldenMatch dedupes → GoldenPipe orchestrates. Zero-config defaults, 97% F1, MCP server per package + one master, multi-arch container images, drop-in Airflow DAGs.

5K 36 5
cangyuanli
floof

Fuzzymatching made easy

5K 5 0
data61
clkhash

CLK hash: hash pii for entity matching

3K 47 7
fritshermans
deduplipy

Python package for deduplication/entity resolution using active learning

2K 82 8
ajl2718
whereabouts

Fast, accurate open source geocoding in Python

2K 71 11
ihmeuw
easylink

A tool that allows users to build and run highly configurable record linkage/entity resolution pipelines.

1K 11 0
moj-analytical-services
splink-graph

pyspark-parallelised functions producing graph-theoretical metrics in connected component clusters for use in record-linkage (or other domains)

1K 10 5
NickCrews
mismo

The SQL/Ibis powered sklearn of record linkage

1K 23 4
data61
blocklib

Python implementations of record linkage blocking techniques.

1K 21 4
ul-mds
gecko-syndata

Python library for the generation and mutation of realistic personal identification data at scale

1K 6 2
data61
anonlink-client

Client side tool for clkhash and blocklib

810 6 2
ADBond
splinkclickhouse

Allows Clickhouse to be used as the execution engine for Splink

700 6 0
ncn-foreigners
blockingpy

Blocking records for record linkage and data deduplication based on ANN algorithms in Python.

684 20 2
ipums
hlink

Hierarchical record linkage at scale

645 13 2
usc-isi-i2
rltk

Record Linkage ToolKit (Find and link entities)

606 111 22
ul-mds
pprl-model

Collection of software packages for performing privacy-preserving record linkage based on Bloom filters

552 1 0
vintasoftware
entity-embed

PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

425 161 16
dedupeio
dedupe-fork-eccovia

A python library for accurate and scaleable data deduplication and entity-resolution

384 4K 569
AI-team-UoA
privjedai

An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Privacy Preserving Record Linkage workflows.

357 5 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery