PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

7.2M 6K 363
adbar
simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

90K 195 15
flairNLP
fundus

A very simple news crawler with a funny name

5K 452 108
johentsch
ms3

A parser for annotated MuseScore 3 files.

3K 55 6
Helsinki-NLP
opusfilter

OpusFilter - Parallel corpus processing toolkit

1K 115 26
nickduran
align

Python library for extracting quantitative, reproducible metrics of multi-level alignment between speakers in naturalistic language corpora.

1K 54 17
opendatalab
mineru-html

MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.

626 239 25
liao961120
concordancer

Searching in-memory corpus with Corpus Query Language (CQL)

584 19 3
ynop
audiomate

Python library for handling audio datasets.

555 138 25
mshakirDr
mfte

MFTE (Multi Feature Tagger of English) Python is the Python version based on Le Foll's MFTE written in Perl. It is extended to include semantic tags from Biber (2006) and Biber et al. (1999), including other specific tags.

430 30 3
grammarly
ua-gec

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

322 270 23
rmalouf
treesearch-ud

High-performance toolkit for querying linguistic dependency parses

255 3 0
edwardseley
lyricscorpora

An unofficial Python API that allows users to create a corpus of lyrical text from their favorite artists and billboard charts

186 18 1
jonathandunn
corpus-similarity

Measure the similarity of text corpora for 74 languages

148 14 3
koskenni
betastr

An open source reimplementation of Benny Brodda's BETA in Python

138 63 2
    • Data from PyPI, GitHub, ClickHouse, and BigQuery