Computational Linguistics Python Packages

pythainlp

Thai natural language processing in Python

1.2M 1K 295

botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

22K 81 15

pycantonese

Cantonese Linguistics and NLP

18K 404 41

rustling

A high-performance library for computational linguistics

13K 2 0

pylangacq

Language Acquisition Research Tools

10K 44 18

wordseg

Word segmentation models

6K 5 1

arpa

:snake: Python library for n-gram models in ARPA format

5K 40 14

pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

4K 476 66

python-ucto

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

4K 31 5

linguaf

python package for calculating famous measures in computational linguistics

3K 15 5

finntk

Some simple high level tools for processing Finnish text

2K 7 0

morphseg

An efficient and easy-to-use morpheme segmentation library

2K 2 0

folia

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.

2K 18 5

pybo

🦜 NLP for Tibetan, in Python.

2K 39 13

folia-linguistic-annotation-tool

FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.

2K 113 15

calamancy

NLP pipelines for Tagalog using spaCy

1K 69 6

wikipron

Scraping grapheme-to-phoneme data from Wiktionary

916 365 77

fonetika

Russian/English/Estonian/Finnish/Swedish phonetic algorithm based on Soundex and Metaphone

787 53 6

ramonak

Універсальная бібліятэка па працы з тэкстам на беларускай мове для Python

543 0 0

glazing

Unified data models and interfaces for syntactic and semantic frame ontologies.

471 7 0

pystylometry

Comprehensive Python toolkit for stylometry

434 2 0

bllipparser

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.

423 228 53

ruts

Библиотека для извлечения статистик из текстов на русском языке.

375 126 21

docria

Semi-structured Document Model (Next-generation)

360 8 1

Search Packages