PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
J535D165
recordlinkage

A powerful and modular toolkit for record linkage and duplicate detection in Python

4.6M 1K 153
moj-analytical-services
splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

718K 2K 234
dedupeio
dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

99K 4K 569
maxharlow
csvmatch

🔎 Finds fuzzy matches between CSV files

8K 191 21
data61
anonlink

Python implementation of anonymous linkage using cryptographic linkage keys

7K 74 8
benzsevern
goldenmatch

🟡 Golden Suite — polyglot data-quality + entity-resolution toolkit. GoldenCheck profiles → GoldenFlow standardizes → GoldenMatch dedupes → GoldenPipe orchestrates. Zero-config defaults, 97% F1, MCP server per package + one master, multi-arch container images, drop-in Airflow DAGs.

5K 36 5
cangyuanli
floof

Fuzzymatching made easy

5K 5 0
zinggAI
zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

4K 1K 165
Picovoice
pvrhino

On-device Speech-to-Intent engine powered by deep learning

4K 700 95
SkyeAv
tablassert

Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in.

2K 5 0
Org-EthereaLogic
etherealogic-aetheriaforge

Databricks-native intelligent data transformation engine — coherence-scored Bronze/Silver/Gold with entity resolution and temporal reconciliation in a single deployable product.

2K 1 0
AI-team-UoA
pyjedai

An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.

2K 93 13
fritshermans
deduplipy

Python package for deduplication/entity resolution using active learning

2K 82 8
raphschlatt
ads-and

NAND-based author name disambiguation for SAO/NASA ADS publication metadata

2K 1 0
Picovoice
pvrhinodemo

On-device Speech-to-Intent engine powered by deep learning

2K 700 95
ihmeuw
easylink

A tool that allows users to build and run highly configurable record linkage/entity resolution pipelines.

1K 11 0
DerwenAI
strwythura

Strwythura: construct an entity-resolved knowledge graph from structured data sources and unstructured content sources, implementing an ontology pipeline, plus context engineering for optimizing AI application outcomes within a specific domain. This produces a Streamlit app, with MLOps instrumentation.

1K 223 25
pmart123
cymbology

financial identifier validation.

1K 15 1
NickCrews
mismo

The SQL/Ibis powered sklearn of record linkage

1K 23 4
ADBond
splinkclickhouse

Allows Clickhouse to be used as the execution engine for Splink

700 6 0
databricks-industry-solutions
databricks-arc

ARC: data linking solution for Databricks with Splink

651 53 22
usc-isi-i2
rltk

Record Linkage ToolKit (Find and link entities)

606 111 22
dobraczka
kiez

🏘️ Hubness reduced nearest neighbor search for entity alignment with knowledge graph embeddings

526 29 3
senzing-garage
sz-semantics

Transform JSON output from Senzing SDK for use with graph technologies, semantics, and downstream LLM integrations

438 18 3
    • Data from PyPI, GitHub, ClickHouse, and BigQuery