PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
WorksApplications
sudachipy

Sudachi in Rust 🦀 and new generation of SudachiPy

1.9M 442 50
WorksApplications
sudachidict-core

A lexicon for Sudachi

1.8M 293 20
WorksApplications
sudachidict-full

A lexicon for Sudachi

600K 293 20
taishi-i
nagisa

A Japanese tokenizer based on recurrent neural networks

231K 417 23
WorksApplications
sudachidict-small

A lexicon for Sudachi

97K 293 20
jidasheng
bi-lstm-crf

A PyTorch implementation of the BI-LSTM-CRF model.

38K 261 46
hankcs
hanlp

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

24K 36K 11K
CAMeL-Lab
camel-tools

A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.

24K 548 89
jacksonllee
rustling

A high-performance library for computational linguistics

13K 2 0
hankcs
hanlp-common

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

9K 36K 11K
roshan-research
hazm

Persian NLP Toolkit

9K 1K 205
hankcs
hanlp-trie

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

9K 36K 11K
huseinzol05
malaya

Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/

8K 522 139
Droidtown
articutapi

API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。

3K 415 38
sammous
spacy-lefff

Custom French POS and lemmatizer based on Lefff for spacy

2K 69 12
timarkh
uniparser-morph

Rule-based, linguist-friendly (and rather slow) morphological analysis

864 7 2
hankcs
hanlp-restful

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

743 36K 11K
dhchenx
ner-kit

Named Entity Recognition Toolkit

642 0 0
praatibhsurana
hindiwsd

A pipeline for transliteration of hinglish code mixed data to hindi along with spell correction and word sense disambiguation of hindi words.

566 37 8
reynoldsnlp
udar

UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.

565 29 1
ysenarath
sinling

A collection of NLP tools for Sinhalese (සිංහල).

557 61 20
VinAIResearch
phonlp

PhoNLP: A BERT-based multi-task learning model for part-of-speech tagging, named entity recognition and dependency parsing (NAACL 2021)

528 150 20
monpa-team
monpa

MONPA is an end-to-end model to jointly conduct Chinese word segmentation, POS and NE labeling

505 247 25
craigtrim
lingpatlab

LingPatLab: Linguistic Pattern Laboratory

401 2 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery