PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
skrub-data
skrub

Machine learning with dataframes

206K 2K 214
ironmussa
optimuspyspark

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

6K 2K 232
desbordante
desbordante

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

3K 477 99
fairtracks
omnipy

Omnipy is a high level Python library for type-driven data wrangling and scalable workflow orchestration (under development)

3K 26 1
whythawk
whyqd

data wrangling simplicity, complete audit transparency, and at speed

2K 35 1
ContextLab
hypertools

A python package for visualizing and manipulating high-dimensional data

2K 2K 162
pmgraham
datagrunt

Datagrunt is a Python library designed to simplify the way you work with CSV files. It provides a streamlined approach to reading, processing, and transforming your data into various formats, making data manipulation efficient and intuitive.

1K 10 2
grzegorzme
data-toolz

simple python library for handling data-io tasks

1K 7 0
hi-primus
pyoptimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

1K 2K 232
vikrantdeshpande09876
anonymized-fraud-detection

A small package to parse and train an ML model for anonymized credit card transactions. Refer to github wikis for more details. Package was built for PythonVirtualenvOperator() on GCP Airflow.

813 2 1
VianneyMI
monggregate

MongoDB aggregation pipelines made easy. Joins, grouping, counting and much more...

733 22 3
LucaCappelletti94
csv-trimming

Package python to remove common ugliness from a csv-like file

651 106 0
ContextLab
pydata-wrangler

Wrangle messy data into DataFrames (pandas or Polars), with a special focus on text data and natural language processing

460 10 2
laoluadewoye
skloverlay

This repository is the official location of the SKLOverlay Project. Here, it will hold everything used for the package on Py Pi, including source files.

427 0 0
LukasHedegaard
datasetops

Fluent dataset operations, compatible with your favorite libraries

346 11 0
nxank4
loclean

High-performance, local-first semantic data cleaning library

314 10 1
asavinov
prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

240 93 5
AliiiBenn
excel-toolkit-cwd

Command-line toolkit for Excel data manipulation and analysis

239 0 0
nateify
ics-fixer

Fix slightly broken iCalendar files

190 0 0
fburic
pandance

Advanced relational operations for pandas DataFrames

181 5 0
PavelGrigoryevDS
frameon

🐼✨ Frameon - enhances pandas DataFrames with analysis methods while preserving all native functionality

179 4 2
fititnt
gis-conflation-toolchain

[EARLY-DRAFT] See geojson-diff.py from https://github.com/fititnt/openstreetmap-vs-dados-abertos-brasil

174 0 0
DataPreprocessing
data-cleaning

Data Cleaning is a python package for data preprocessing. This cleans the CSV file and returns the cleaned data frame. It does the work of imputation, removing duplicates, replacing special characters, and many more.

171 9 4
onlozanoo
databroom

A cross-language DataFrame cleaning assistant with interactive GUI and one-click code export

169 7 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery