PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Text Processing Python Packages

Python packages with the GitHub topic text-processing. Sorted by relevance, with stars and monthly downloads.
pyparsing
pyparsing

Python library for creating PEG parsers

363.9M 2K 310
pymupdf
pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

78.5M 10K 718
pymupdf
pymupdfb

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

4.4M 10K 718
derek73
nameparser

A simple Python module for parsing human names into their individual components

4.1M 707 105
ikegami-yukino
jaconv

Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku

2.9M 347 33
PyThaiNLP
pythainlp

Thai natural language processing in Python

1.2M 1K 295
kreuzberg-dev
html-to-markdown

High performance and CommonMark compliant HTML to Markdown converter. Maintained by the Kreuzberg team. Kreuzberg is a fast, polyglot document intelligence engine with a Rust core. It extracts structured data from 56+ document formats using streaming parsers and built-in OCR.

490K 694 55
Ailln
proces

🐨 text preprocess.

222K 5 0
thombashi
humanreadable

humanreadable is a Python library to convert human-readable values to other units.

160K 21 1
wenet-e2e
wetextprocessing

Text Normalization & Inverse Text Normalization

114K 758 102
Lips7
matcher-py

A high-performance matcher designed to solve LOGICAL and TEXT VARIATIONS problems in word matching, implemented in Rust.

88K 18 1
casics
nostril-detector

Nostril: Nonsense String Evaluator

31K 199 34
voidful
tfkit

🤖📇 handling multiple nlp task in one pipeline

18K 57 6
swen128
twitter-text-parser

Twitter Text Libraries for Python

18K 29 3
daac-tools
daachorse

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)

14K 21 1
jacksonllee
rustling

A high-performance library for computational linguistics

13K 2 0
PyThaiNLP
nlpo3

Thai natural language processing library in Rust, with Python and Node bindings.

11K 44 13
roshan-research
hazm

Persian NLP Toolkit

9K 1K 205
shner-elmo
flashtext2

Flashtext implementation in Rust

9K 11 1
vmenger
deduce

Deduce: de-identification method for Dutch medical text

5K 64 27
timminator
wordninja-enhanced

Probabilistically split concatenated words. Now with more functionality and languages!

5K 4 0
farfarfun
funread

文档阅读和解析工具包 - 支持多种文档格式的读取和解析

4K 1 0
rajatim
zhtw

Taiwan Traditional Chinese quality tool for AI-generated content (CLI + 6-language SDK, 31K terms / 100M-char zero-mistranslation validation)

4K 9 0
proycon
pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

4K 476 66
    • Data from PyPI, GitHub, ClickHouse, and BigQuery