PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
taleinat
fuzzysearch

Find parts of long text or data, allowing for some changes/typos.

753K 342 27
moj-analytical-services
splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

718K 2K 234
mammothb
symspellpy

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

420K 869 126
chrislit
abydos

Abydos NLP/IR library for Python

59K 194 43
matchms
matchms

Python library for processing (tandem) mass spectrometry data and for computing spectral similarities.

47K 256 78
gandersen101
spaczz

Fuzzy matching and more functionality for spaCy.

17K 258 31
RobinL
fuzzymatcher

Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4

11K 286 60
maxharlow
csvmatch

🔎 Finds fuzzy matches between CSV files

8K 191 21
austinv11
prefixtrie

This is a high-performance implementation of a Prefix Trie to perform efficient fuzzy string matches.

6K 1 0
proycon
analiticcl

an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction (mirror of https://codeberg.org/proycon/analiticcl)

6K 39 4
benzsevern
goldenmatch

🟡 Golden Suite — polyglot data-quality + entity-resolution toolkit. GoldenCheck profiles → GoldenFlow standardizes → GoldenMatch dedupes → GoldenPipe orchestrates. Zero-config defaults, 97% F1, MCP server per package + one master, multi-arch container images, drop-in Airflow DAGs.

5K 36 5
cangyuanli
floof

Fuzzymatching made easy

5K 5 0
zinggAI
zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

4K 1K 165
yymao
fuzzyname

A simple Python class for easier name matching (especially in academia).

3K 2 0
dbousque
batch-jaro-winkler

Fast batch jaro winkler distance implementation in C99 with Ruby, OCaml and Python bindings.

3K 27 4
rosette-api
rosette-api

Babel Street Analytics Client Library for Python

3K 38 37
Christopher-Thornton
hmni

📛 Fuzzy Name Matching with Machine Learning

3K 268 51
AI-team-UoA
pyjedai

An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.

2K 93 13
fritshermans
deduplipy

Python package for deduplication/entity resolution using active learning

2K 82 8
benzsevern
infermap

Inference-driven schema mapping engine for Python and TypeScript. 7 built-in scorers, domain dictionaries (healthcare/finance/ecommerce), confidence calibration, cross-language accuracy benchmark (F1 0.84), and full Python↔TypeScript parity.

2K 0 0
sayedyousef
arabnamer

Offline Arabic name transliteration & fuzzy similarity. No LLM, no API calls, names never leave your infrastructure. Ships a 38 MB pruned XGBoost model + 22K EN↔AR dictionary. 98.4 lenient accuracy on MENA benchmark. Built for KYC, compliance, entity resolution, Arabic NLP preprocessing.

2K 0 0
iomega
spec2vec

Word2Vec based similarity measure of mass spectrometry data.

2K 84 20
Neelagiri65
bharataddress

Deterministic, offline parser for messy Indian addresses. 26,711 pincodes embedded with OSM centroids, DIGIPIN encode/decode, phonetic alias matching (Gurgaon/Gurugram, Bengaluru/Bangalore), zero required dependencies. Optional opt-in Nominatim geocoding.

1K 0 0
scossin
iamsystem

A python implementation of IAMsystem algorithm

978 7 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery