Word Segmentation Python Packages

sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

32.5M 12K 1K

pythainlp

Thai natural language processing in Python

1.2M 1K 295

symspellpy

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

420K 869 126

kiwipiepy

Python API for Kiwi

252K 375 33

nagisa

A Japanese tokenizer based on recurrent neural networks

231K 417 23

kiwipiepy-model

Python API for Kiwi

149K 375 33

adaseq

AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models

53K 453 44

bi-lstm-crf

A PyTorch implementation of the BI-LSTM-CRF model.

38K 261 46

pycantonese

Cantonese Linguistics and NLP

18K 404 41

rustling

A high-performance library for computational linguistics

13K 2 0

youtokentome

Unsupervised text tokenizer focused on computational efficiency

11K 977 108

wordseg

Word segmentation models

6K 5 1

bert-multitask-learning

BERT for Multitask Learning

2K 544 123

ekphrasis

Text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction.

2K 675 94

lac

百度NLP：分词，词性标注，命名实体识别，词重要性

2K 4K 590

tf-sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

2K 12K 1K

vncorenlp

A Python wrapper for VnCoreNLP using a bidirectional communication channel.

1K 58 18

cjieba

Python cffi binding to CppJieba

755 15 0

hellonlp

NLP tools, word segmentation, sentence segmentation， New-Word-Discovery，新词发现

741 27 9

symspellcpppy

Fast SymSpell written in c++ and exposes to python via pybind11

653 44 9

kitoken

Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization

646 49 2

dksplit

High-performance string segmentation using BiLSTM-CRF

627 2 0

monpa

MONPA is an end-to-end model to jointly conduct Chinese word segmentation, POS and NE labeling

505 247 25

vgram

V-gram builder library

484 7 0

Search Packages