PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
PyThaiNLP
pythainlp

Thai natural language processing in Python

1.2M 1K 295
OpenPecha
botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

22K 81 15
jacksonllee
pycantonese

Cantonese Linguistics and NLP

18K 404 41
jacksonllee
rustling

A high-performance library for computational linguistics

13K 2 0
jacksonllee
pylangacq

Language Acquisition Research Tools

10K 44 18
jacksonllee
wordseg

Word segmentation models

6K 5 1
sfischer13
arpa

:snake: Python library for n-gram models in ARPA format

5K 40 14
proycon
pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

4K 476 66
proycon
python-ucto

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

4K 31 5
Perevalov
linguaf

python package for calculating famous measures in computational linguistics

3K 15 5
frankier
finntk

Some simple high level tools for processing Finnish text

2K 7 0
TheWelcomer
morphseg

An efficient and easy-to-use morpheme segmentation library

2K 2 0
proycon
folia

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.

2K 18 5
Esukhia
pybo

🦜 NLP for Tibetan, in Python.

2K 39 13
proycon
folia-linguistic-annotation-tool

FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.

2K 113 15
ljvmiranda921
calamancy

NLP pipelines for Tagalog using spaCy

1K 69 6
CUNY-CL
wikipron

Scraping grapheme-to-phoneme data from Wiktionary

916 365 77
roddar92
fonetika

Russian/English/Estonian/Finnish/Swedish phonetic algorithm based on Soundex and Metaphone

787 53 6
alex-rusakevich
ramonak

Універсальная бібліятэка па працы з тэкстам на беларускай мове для Python

543 0 0
factslab
glazing

Unified data models and interfaces for syntactic and semantic frame ontologies.

471 7 0
craigtrim
pystylometry

Comprehensive Python toolkit for stylometry

434 2 0
BLLIP
bllipparser

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.

423 228 53
SergeyShk
ruts

Библиотека для извлечения статистик из текстов на русском языке.

375 126 21
marcusklang
docria

Semi-structured Document Model (Next-generation)

360 8 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery