PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Fuzzy Matching Python Packages

Python packages with the GitHub topic fuzzy-matching. Sorted by relevance, with stars and monthly downloads.
taleinat
fuzzysearch

Find parts of long text or data, allowing for some changes/typos.

763K 342 27
moj-analytical-services
splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

740K 2K 234
mammothb
symspellpy

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

421K 869 126
chrislit
abydos

Abydos NLP/IR library for Python

56K 194 43
matchms
matchms

Python library for processing (tandem) mass spectrometry data and for computing spectral similarities.

48K 256 78
gandersen101
spaczz

Fuzzy matching and more functionality for spaCy.

17K 258 31
RobinL
fuzzymatcher

Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4

11K 286 60
maxharlow
csvmatch

🔎 Finds fuzzy matches between CSV files

8K 191 21
austinv11
prefixtrie

This is a high-performance implementation of a Prefix Trie to perform efficient fuzzy string matches.

6K 1 0
proycon
analiticcl

an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction (mirror of https://codeberg.org/proycon/analiticcl)

6K 39 4
benzsevern
goldenmatch

🟡 Golden Suite — polyglot data-quality + entity-resolution toolkit. GoldenCheck profiles → GoldenFlow standardizes → GoldenMatch dedupes → GoldenPipe orchestrates. Zero-config defaults, 97% F1, MCP server per package + one master, multi-arch container images, drop-in Airflow DAGs.

5K 36 5
zinggAI
zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

5K 1K 165
cangyuanli
floof

Fuzzymatching made easy

5K 5 0
yymao
fuzzyname

A simple Python class for easier name matching (especially in academia).

3K 2 0
dbousque
batch-jaro-winkler

Fast batch jaro winkler distance implementation in C99 with Ruby, OCaml and Python bindings.

3K 27 4
Christopher-Thornton
hmni

📛 Fuzzy Name Matching with Machine Learning

2K 268 51
rosette-api
rosette-api

Babel Street Analytics Client Library for Python

2K 38 37
AI-team-UoA
pyjedai

An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.

2K 93 13
benzsevern
infermap

Inference-driven schema mapping engine for Python and TypeScript. 7 built-in scorers, domain dictionaries (healthcare/finance/ecommerce), confidence calibration, cross-language accuracy benchmark (F1 0.84), and full Python↔TypeScript parity.

2K 0 0
fritshermans
deduplipy

Python package for deduplication/entity resolution using active learning

2K 82 8
sayedyousef
arabnamer

Offline Arabic name transliteration & fuzzy similarity. No LLM, no API calls, names never leave your infrastructure. Ships a 38 MB pruned XGBoost model + 22K EN↔AR dictionary. 98.4 lenient accuracy on MENA benchmark. Built for KYC, compliance, entity resolution, Arabic NLP preprocessing.

2K 0 0
iomega
spec2vec

Word2Vec based similarity measure of mass spectrometry data.

2K 84 20
Neelagiri65
bharataddress

Deterministic, offline parser for messy Indian addresses. 26,711 pincodes embedded with OSM centroids, DIGIPIN encode/decode, phonetic alias matching (Gurgaon/Gurugram, Bengaluru/Bangalore), zero required dependencies. Optional opt-in Nominatim geocoding.

1K 0 0
scossin
iamsystem

A python implementation of IAMsystem algorithm

984 7 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery