PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
google
sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

32.5M 12K 1K
PyThaiNLP
pythainlp

Thai natural language processing in Python

1.2M 1K 295
mammothb
symspellpy

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

420K 869 126
bab2min
kiwipiepy

Python API for Kiwi

252K 375 33
taishi-i
nagisa

A Japanese tokenizer based on recurrent neural networks

231K 417 23
bab2min
kiwipiepy-model

Python API for Kiwi

149K 375 33
modelscope
adaseq

AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models

53K 453 44
jidasheng
bi-lstm-crf

A PyTorch implementation of the BI-LSTM-CRF model.

38K 261 46
jacksonllee
pycantonese

Cantonese Linguistics and NLP

18K 404 41
jacksonllee
rustling

A high-performance library for computational linguistics

13K 2 0
vkcom
youtokentome

Unsupervised text tokenizer focused on computational efficiency

11K 977 108
jacksonllee
wordseg

Word segmentation models

6K 5 1
JayYip
bert-multitask-learning

BERT for Multitask Learning

2K 544 123
cbaziotis
ekphrasis

Text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction.

2K 675 94
baidu
lac

百度NLP:分词,词性标注,命名实体识别,词重要性

2K 4K 590
google
tf-sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

2K 12K 1K
dnanhkhoa
vncorenlp

A Python wrapper for VnCoreNLP using a bidirectional communication channel.

1K 58 18
messense
cjieba

Python cffi binding to CppJieba

755 15 0
hellonlp
hellonlp

NLP tools, word segmentation, sentence segmentation, New-Word-Discovery,新词发现

741 27 9
viig99
symspellcpppy

Fast SymSpell written in c++ and exposes to python via pybind11

653 44 9
Systemcluster
kitoken

Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization

646 49 2
ABTdomain
dksplit

High-performance string segmentation using BiLSTM-CRF

627 2 0
monpa-team
monpa

MONPA is an end-to-end model to jointly conduct Chinese word segmentation, POS and NE labeling

505 247 25
akhvorov
vgram

V-gram builder library

484 7 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery