PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
skrub-data
skrub

Machine learning with dataframes

206K 2K 214
ironmussa
optimuspyspark

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

6K 2K 232
amphi-ai
jupyterlab-amphi

visual data prep powered by python

3K 1K 106
snowmuffin
convmerge

Merge heterogeneous chat/text sources into a single LLM training format (JSONL)

2K 0 1
sisinflab
datarec-lib

Compatibility wrapper for the renamed DataRec package.

1K 20 1
johanneskasser
hdsemg-select

hdsemg-select package

1K 1 0
hi-primus
pyoptimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

1K 2K 232
amphi-ai
amphi-scheduler

Amphi Scheduler (JupyterLab extension + Python backend)

806 1K 106
sisinflab
datarec

A Python Library for Standardized and Reproducible Data Management in Recommender Systems

775 20 1
CyberCRI
refinedoc

Python library for post-extraction refinement of text that may be derived from PDF extraction.

440 26 3
developmentseed
label-maker

Data Preparation for Satellite Machine Learning

408 469 107
tracebloc
tracebloc-ingestor

tracebloc data pipeline for training/test dataset setup

379 8 0
kozodoi
dptools

Data Preprocessing Tools

363 5 3
Florian-Katerndahl
forestiler

Create Image Tiles From Large Input Rasters According to a Classified Mask Vector File

307 0 0
asavinov
prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

240 93 5
dataclr
dataclr

A Python library for feature selection in tabular datasets

218 20 2
ixlan
machine-learning-data-pipeline

Pipeline module for parallel real-time data processing for machine learning models development and production purposes.

194 22 2
NVIDIA
invisible-rabbit

Scalable Data Preprocessing Tool for Training Large Language Models

185 2K 264
ved93
ml-express

A Python library for day to day data analysis and machine learning.

177 3 1
maksymsur
spltr

A simple PyTorch-based data loader and splitter

155 1 0
alihanozz
daxpy

A pre-machine-learning model package

80 0 0
NVIDIA
invisible-unicorn

Scalable data pre processing and curation toolkit for LLMs

71 2K 264
    • Data from PyPI, GitHub, ClickHouse, and BigQuery