PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

7.2M 6K 363
deanmalmgren
textract

extract text from any document. no muss. no fuss.

381K 5K 673
csurfer
rake-nltk

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

275K 1K 151
bookieio
breadability

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

100K 205 25
Lips7
matcher-py

A high-performance matcher designed to solve LOGICAL and TEXT VARIATIONS problems in word matching, implemented in Rust.

92K 18 1
Lilykos
pyphonetics

A Python 3 phonetics library.

83K 139 21
KyleKing
textract-py3

Maintained fork of deanmalmgren/textract to replace '*' dependencies and other updates

53K 14 2
JasonKessler
scattertext

Beautiful visualizations of how language differs among document types.

20K 2K 286
aphp
edsnlp

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.

19K 163 41
biolab
orange3-text

🍊 :page_facing_up: Text Mining add-on for Orange3

11K 134 86
averbis
averbis-python-api

Conveniently access the REST API of Averbis products using Python

5K 12 5
vmenger
deduce

Deduce: de-identification method for Dutch medical text

5K 64 27
huspacy
huspacy

HuSpaCy: industrial-strength Hungarian natural language processing

3K 182 18
mesejo
trrex

Efficient string matching with regular expressions

3K 146 7
PetrKorab
arabica

Python package for text mining of time-series data

3K 75 16
huspacy
huspacy-nightly

HuSpaCy: industrial-strength Hungarian natural language processing

3K 182 18
rosette-api
rosette-api

Babel Street Analytics Client Library for Python

3K 38 37
stephenhky
shorttext

Various Algorithms for Short Text Mining

2K 471 74
lasigeBioTM
bent

Biomedical Term Annotator

2K 9 1
vgrabovets
multi-rake

Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python

2K 272 37
jbesomi
texthero

Text preprocessing, representation and visualization from zero to hero.

2K 3K 237
sergioburdisso
pyss3

A Python library for Interpretable Machine Learning in Text Classification using the SS3 model, with easy-to-use visualization tools for Explainable AI :octocat:

1K 348 44
ronaldgosso
semantic-keywords

TF-IDF counts words. semantic-keywords understands meaning. It uses sentence embeddings (all-MiniLM-L6-v2 by default) and Maximal Marginal Relevance (MMR) to return keywords that are both relevant and diverse — not just the most frequent phrases. Works fully offline after a one-time model download. No API key. No rate limits.

1K 0 0
cgshep
pyeditdistance

A pure, minimalist, no-dependency Python library of various edit distances.

1K 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery