PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Text Analysis Python Packages

Python packages with the GitHub topic text-analysis. Sorted by relevance, with stars and monthly downloads.
5j9
wikitextparser

A Python library to parse MediaWiki WikiText

90K 320 24
Lips7
matcher-py

A high-performance matcher designed to solve LOGICAL and TEXT VARIATIONS problems in word matching, implemented in Rust.

84K 18 1
biolab
orange3-text

🍊 :page_facing_up: Text Mining add-on for Orange3

11K 134 86
shcherbak-ai
contextgem

ContextGem: Effortless LLM extraction from documents

9K 2K 155
NationalLibraryOfNorway
dhlab

DHLAB is a library of python modules for accessing text and pictures at the National Library of Norway.

6K 26 4
jboynyc
textnets

Automated text analysis with networks

5K 294 23
power-of-language
oneai

Python SDK for One AI APIs. One AI is an NLP-as-a-service platform. Our APIs enables language comprehension in context, transforming texts from any source into structured data to use in code.

4K 38 7
johnbumgarner
wordhoard

This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.

3K 125 12
quadrismegistus
logmap

A hierarchical, context-manager logger utility with multiprocess mapping capabilities

3K 0 0
convosense
convosense-utilities

Email Signature remover - Extracting email body out of the email text in order to get accurate sentiment results, using NLP tasks.

3K 22 2
BlackMount-ai
blackmount-nlp-mcp

NLP without the bloat — sentiment, keywords, readability, summarization. No NLTK, no spaCy. Zero heavy dependencies.

2K 1 0
microsoft
autobrewml

With AutoBrewML Framework the time it takes to get production-ready ML models with great ease and efficiency highly accelerates.

2K 25 31
rosette-api
rosette-api

Babel Street Analytics Client Library for Python

2K 38 37
nlpie
biomedicus

BioMedICUS: A biomedical and clinical NLP engine.

2K 21 8
twardoch
split-markdown4gpt

A Python tool for splitting large Markdown files into smaller sections based on a specified token limit. This is particularly useful for processing large Markdown files with GPT models, as it allows the models to handle the data in manageable chunks.

2K 29 3
welfare-state-analytics
humlab-westac

Welfare State Analytics

2K 5 0
meer-khan
pattex

Regex-based pattern extraction library for Python — emails, URLs, phones, IPs, and more.

2K 0 0
MycroftAI
padatious

A neural network intent parser

2K 162 42
ronaldgosso
semantic-keywords

TF-IDF counts words. semantic-keywords understands meaning. It uses sentence embeddings (all-MiniLM-L6-v2 by default) and Maximal Marginal Relevance (MMR) to return keywords that are both relevant and diverse — not just the most frequent phrases. Works fully offline after a one-time model download. No API key. No rate limits.

1K 0 0
prosegrinder
prosegrinder

A relatively fast, functional prose text counter with readability scoring.

1K 4 2
seandstewart
iambic

Data extraction and rendering library for Shakespearean text.

1K 1 0
sagnik-chakravarty
arcshiftwrap

End-to-end pipeline for collecting, labeling, and analyzing metaphor framing and stance in Reddit and news discourse using LLMs.

1K 1 0
nickduran
align

Python library for extracting quantitative, reproducible metrics of multi-level alignment between speakers in naturalistic language corpora.

1K 54 17
mlchrzan
pairadigm

Concept-Guided Chain-of-Thought (CGCoT) pairwise annotation tool for systematic text evaluation using LLMs. Generate breakdowns, compare items, compute scores, and validate against human judgments. Supports Ollama, Hugging Face, Google Gemini, OpenAI, and Anthropic models.

1K 6 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery