PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
Unstructured-IO
unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

5.2M 15K 1K
ikegami-yukino
jaconv

Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku

2.9M 347 33
KinWaiCheuk
nnaudio

Audio processing by using pytorch 1D convolution network

243K 1K 97
MinishLab
semhash

Fast Multimodal Semantic Deduplication & Filtering

53K 919 56
ikegami-yukino
neologdn

Japanese text normalizer for mecab-neologd

43K 289 20
sappelhoff
pyprep

PyPREP: A Python implementation of the Preprocessing Pipeline (PREP) for EEG data

42K 173 36
winedarksea
autots

Automated Time Series Forecasting

25K 1K 123
sutariyaraj
indic-num2words

Python library for converting numbers to words for all Indian Languages.

23K 36 14
R1j1t
contextualspellcheck

✔️Contextual word checker for better suggestions (not actively maintained)

22K 419 65
erdogant
df2onehot

Convert a unstructured array into a stuctured dataframe.

18K 3 2
doubleBite
nums-from-string

Extract numbers from a string

13K 2 0
sunlabuiuc
pyhealth

A Deep Learning Python Toolkit for Healthcare Applications.

12K 2K 769
autoreject
autoreject

Automated rejection and repair of bad trials/sensors in M/EEG

11K 157 60
calvinmccarter
kditransform

Kernel density integral transformation: feature preprocessing and univariate clustering (TMLR, 2023)

9K 9 0
NVIDIA-Merlin
nvtabular

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

9K 1K 149
sccn
eegprep

EEGPrep is an automated preprocessing tool for human EEG data built on a benchmarked EEGLAB pipeline

8K 21 4
allenai
smashed

SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.

6K 35 6
fkie-cad
logprep

log data pre processing, generation and shipping in python

5K 36 10
dongrixinyu
jionlp

中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

5K 4K 443
EttoreRocchi
maldiamrkit

Comprehensive toolkit for MALDI-TOF mass spectrometry data preprocessing for antimicrobial resistance (AMR) prediction purposes

4K 3 1
jbusecke
cmip6-preprocessing

Analysis ready CMIP6 data in python the easy way with pangeo tools.

3K 203 43
yuvaraj3855
preocr

Fast document classification and OCR detection. Analyzes any file type to determine if OCR is needed, saving time and money on unnecessary processing.

3K 10 4
dfd
sktutor

sktutor helps your machines learn

3K 2 1
nlgranger
seqtools

A python library to manipulate and transform indexable data (lists, arrays, ...)

3K 47 4
    • Data from PyPI, GitHub, ClickHouse, and BigQuery