Nlp Library Python Packages

spacy

💫 Industrial-strength Natural Language Processing (NLP) in Python

21.9M 34K 5K

pythainlp

Thai natural language processing in Python

1.2M 1K 295

tika

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

410K 2K 251

cn2an

📦 快速转化「中文数字」和「阿拉伯数字」～ (最新特性：分数，日期、温度等转化）

263K 759 82

janome

Japanese morphological analysis engine written in pure Python

250K 913 54

nagisa

A Japanese tokenizer based on recurrent neural networks

231K 417 23

urduhack

An NLP library for the Urdu language. It comes with a lot of battery included features to help you process Urdu data in the easiest way possible.

48K 309 43

unitxt

🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data for end-to-end AI benchmarking

33K 212 67

camel-tools

A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.

24K 548 89

botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

22K 81 15

laonlp

Lao language Natural Language Processing toolkit

20K 34 6

spaczz

Fuzzy matching and more functionality for spaCy.

17K 258 31

medspacy

Library for clinical NLP with spaCy.

13K 649 111

breame

Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

12K 18 0

khmer-nltk

Khmer natural language processing toolkit

8K 81 19

simstring-pure

A Python implementation of the SimString, a simple and efficient algorithm for approximate string matching.

6K 125 17

fern2

A model development structure control for NLP

4K 3 1

pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

4K 476 66

python-ucto

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

4K 31 5

soe-vinorm

Soe Vinorm: An Effective Text Normalization Toolkit for converting Vietnamese text to its spoken form.

4K 19 8

mlconjug3

A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning techniques.

3K 80 13

spacy-udpipe

spaCy + UDPipe

3K 168 9

farm

Framework for finetuning and evaluating transformer based language models

3K 2K 247

contextualized-topic-models

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).

3K 1K 151

Search Packages