PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

7.2M 6K 363
AzizNadirov
textlasso

TextLasso is a Simple Python library for extracting structured data from raw text, with special focus on processing LLM (Large Language Model) responses.

2K 3 0
blmoistawinde
harvesttext

No description available

2K 3K 338
rhnfzl
squeakycleantext

Text preprocessing and PII anonymisation for NLP/ML. ONNX NER ensemble, language detection, stopword removal. Built for statistical ML and language models.

2K 8 0
currentsapi
extractnet

Extract the main article content (and optionally comments) from a web page

1K 297 26
hscspring
pnlp

NLP预/后处理工具。

1K 30 6
wisupai
wisup-e2m

E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2M offers an all-in-one, flexible, and open-source solution.

839 1K 72
infinitode
valx

An open-source Python library for data cleaning tasks. Includes profanity detection, and removal. Now includes offensive language and hate speech detection using an AI model.

796 5 1
Ankur3107
nlp-preprocessing

Text Preprocessing Package includes cleaning, tokenization, dataset preparation ...etc

509 18 7
Aayushpatel007
topicrankpy

A Python package to get useful information from documents using TopicRank Algorithm.

271 16 3
alinapetukhova
textcl

Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/

211 12 4
mim-solutions
mim-nlp

A Python package with ready-to-use models for various NLP tasks and text preprocessing utilities. The implementation allows fine-tuning.

162 2 0
pszemraj
rehuman

Python bindings for rehuman: Unicode-safe text cleaning & normalization

145 0 0
hscspring
hnlp

NLP预/后处理工具。

132 30 6
sharejing
takin

A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。

104 36 7
b-a-sabbir
banglish-stopwords

A lightweight Python library to filter 350+ Banglish stopwords for NLP and text cleaning.

103 0 0
umapornp
textprepro

Everything Everyway All At Once Text Preprocessing.

97 2 0
YuvanJain
text-cleaner-yuvan

A simple text cleaning tool for NLP.

62 0 0
ternaus
ternaus-cleantext

Clean text from extra spaces and special symbols as in the CLIP model.

29 2 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery