PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Record Linkage Python Packages

Python packages with the GitHub topic record-linkage. Sorted by relevance, with stars and monthly downloads.
J535D165
recordlinkage

A powerful and modular toolkit for record linkage and duplicate detection in Python

4.6M 1K 153
moj-analytical-services
splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

740K 2K 234
dedupeio
dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

100K 4K 569
maxharlow
csvmatch

🔎 Finds fuzzy matches between CSV files

8K 191 21
data61
anonlink

Python implementation of anonymous linkage using cryptographic linkage keys

7K 74 8
benzsevern
goldenmatch

🟡 Golden Suite — polyglot data-quality + entity-resolution toolkit. GoldenCheck profiles → GoldenFlow standardizes → GoldenMatch dedupes → GoldenPipe orchestrates. Zero-config defaults, 97% F1, MCP server per package + one master, multi-arch container images, drop-in Airflow DAGs.

5K 36 5
cangyuanli
floof

Fuzzymatching made easy

5K 5 0
data61
clkhash

CLK hash: hash pii for entity matching

3K 47 7
fritshermans
deduplipy

Python package for deduplication/entity resolution using active learning

2K 82 8
ajl2718
whereabouts

Fast, accurate open source geocoding in Python

2K 71 11
moj-analytical-services
splink-graph

pyspark-parallelised functions producing graph-theoretical metrics in connected component clusters for use in record-linkage (or other domains)

1K 10 5
ihmeuw
easylink

A tool that allows users to build and run highly configurable record linkage/entity resolution pipelines.

1K 11 0
data61
blocklib

Python implementations of record linkage blocking techniques.

1K 21 4
ul-mds
gecko-syndata

Python library for the generation and mutation of realistic personal identification data at scale

1K 6 2
NickCrews
mismo

The SQL/Ibis powered sklearn of record linkage

1K 23 4
data61
anonlink-client

Client side tool for clkhash and blocklib

869 6 2
ncn-foreigners
blockingpy

Blocking records for record linkage and data deduplication based on ANN algorithms in Python.

712 20 2
usc-isi-i2
rltk

Record Linkage ToolKit (Find and link entities)

679 111 22
ipums
hlink

Hierarchical record linkage at scale

648 13 2
ul-mds
pprl-model

Collection of software packages for performing privacy-preserving record linkage based on Bloom filters

640 1 0
ADBond
splinkclickhouse

Allows Clickhouse to be used as the execution engine for Splink

613 6 0
vintasoftware
entity-embed

PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

470 161 16
dedupeio
dedupe-fork-eccovia

A python library for accurate and scaleable data deduplication and entity-resolution

387 4K 569
ufbmi
deduper

OneFlorida De-duplication Software

345 12 4
    • Data from PyPI, GitHub, ClickHouse, and BigQuery