PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Word Segmentation Python Packages

Python packages with the GitHub topic word-segmentation. Sorted by relevance, with stars and monthly downloads.
google
sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

32.8M 12K 1K
PyThaiNLP
pythainlp

Thai natural language processing in Python

1.2M 1K 295
mammothb
symspellpy

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

421K 869 126
bab2min
kiwipiepy

Python API for Kiwi

252K 375 33
taishi-i
nagisa

A Japanese tokenizer based on recurrent neural networks

228K 417 23
bab2min
kiwipiepy-model

Python API for Kiwi

149K 375 33
modelscope
adaseq

AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models

50K 453 44
jidasheng
bi-lstm-crf

A PyTorch implementation of the BI-LSTM-CRF model.

38K 261 46
jacksonllee
pycantonese

Cantonese Linguistics and NLP

20K 404 41
jacksonllee
rustling

A high-performance library for computational linguistics

14K 2 0
vkcom
youtokentome

Unsupervised text tokenizer focused on computational efficiency

11K 977 108
jacksonllee
wordseg

Word segmentation models

7K 5 1
JayYip
bert-multitask-learning

BERT for Multitask Learning

3K 544 123
cbaziotis
ekphrasis

Text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction.

2K 675 94
baidu
lac

百度NLP:分词,词性标注,命名实体识别,词重要性

2K 4K 590
google
tf-sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

2K 12K 1K
dnanhkhoa
vncorenlp

A Python wrapper for VnCoreNLP using a bidirectional communication channel.

1K 58 18
hellonlp
hellonlp

NLP tools, word segmentation, sentence segmentation, New-Word-Discovery,新词发现

821 27 9
messense
cjieba

Python cffi binding to CppJieba

811 15 0
viig99
symspellcpppy

Fast SymSpell written in c++ and exposes to python via pybind11

680 44 9
ABTdomain
dksplit

High-performance string segmentation using BiLSTM-CRF

649 2 0
Systemcluster
kitoken

Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization

621 49 2
monpa-team
monpa

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

497 247 25
akhvorov
vgram

V-gram builder library

483 7 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery