Text Similarity Python Packages

simphile

Python Text Similarity NLP Libray

7K 37 6

text2vec

text2vec, text to vector. 文本向量表征工具，把文本转化为向量矩阵，实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型，开箱即用。

5K 5K 427

sentence-plagiarism

Compare sentences from input document with all sentences from reference documents - find very similar ones.

685 3 0

torchblocks-chen

A PyTorch-based toolkit for natural language processing

639 160 27

textgo

Let's go and play with text!

532 45 3

company-name-matcher

A library for matching and comparing company names using a fine-tuned sentence transformer model

479 9 1

recs-searcher

Python library for correcting registry and spelling errors in user input when comparing with a database of texts.

454 2 0

akin

Python library for detecting near duplicate texts in a corpus at scale.

385 9 0

pysemantics

Free Python client, that utilizes the digitalowl.org NLP API.

234 9 2

gts-engine

GTS Engine: A powerful NLU Training System。GTS引擎（GTS-Engine）是一款开箱即用且性能强大的自然语言理解引擎，聚焦于小样本任务，能够仅用小样本就能自动化生产NLP模型。

219 93 10

dandelion-eu

A python client for connecting to all the services provided by https://dandelion.eu

180 35 15

torchblocks

A PyTorch-based toolkit for natural language processing

176 160 27

near-synonym

near-synonym, 中文反义词/近义词(antonym/synonym)工具包.

138 31 3

char-similar

字符相似度, 汉字字形/拼音/语义相似度(单字, 可用于数据增强, CSC错别字检测识别任务(构建混淆集)) Chinese character font/pinyin/semantic similarity (single character, can be used for data augmentation, CSC misclassified character detection and recognition tasks (building confusion sets))

101 22 3

compario

A new package that uses large language models and pattern matching to perform structured similarity comparisons between textual content based on normalized compression distance. Users provide multiple

87 1 0

xiangsi

中文文本相似度计算器

86 170 23

qs-kpa

Matching The Statements: A Simple and Accurate Model for Key Point Analysis (ArgMining | EMNLP 2021)

70 12 1

textblob-ar-mk

Arabic language extension for TextBlob.

67 86 24

xiangshi

中文文本相似度计算器

14 170 23