PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
pandera-dev
pandera

A light-weight, flexible, and expressive statistical data testing library

8.7M 4K 395
lithops-cloud
lithops

A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀

145K 364 121
NVIDIA
nvidia-nvimgcodec-cu12

A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface

125K 146 14
svenkreiss
pysparkling

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

124K 271 45
NVIDIA
nvidia-dali-cuda120

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

83K 6K 663
datachain-ai
datachain

Data Memory: the operational data context layer for AI agents - typed, versioned datasets over images, video, docs and tables

46K 3K 140
allenai
dolma

Data and tools for generating and inspecting OLMo pre-training data.

44K 1K 189
run-house
kubetorch

Distribute and run AI workloads on Kubernetes magically in Python, like PyTorch for ML infra.

29K 1K 57
bytewax
bytewax

Python Stream Processing

27K 2K 109
run-house
runhouse

Distribute and run AI workloads on Kubernetes magically in Python, like PyTorch for ML infra.

26K 1K 57
python-bonobo
bonobo

Extract Transform Load for Python 3.5+

26K 2K 145
crate
cratedb-toolkit

CrateDB Toolkit, an SDK for CrateDB and CrateDB Cloud.

17K 11 4
matthewdeanmartin
untruncate-json

Python library to repair truncated json. Translated directly from the typescript original version

16K 5 0
pathwaycom
pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

16K 63K 2K
wq
itertable

⇔ IterTable is a Pythonic API for iterating through tabular data formats, including CSV, XLSX, XML, and JSON.

12K 53 11
NVIDIA
nvidia-nvimgcodec-cu11

A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface

12K 146 14
NVIDIA
nvidia-nvimgcodec-cu13

A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface

9K 146 14
tandav
pipe21

Simple functional pipes for python

8K 19 0
polyaxon
haupt

Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon

8K 451 207
abdubakr77
deepcsv

Automatically processes data files in directories, converts array-like strings to NumPy arrays, detects and fixes data type issues, and saves results as optimized Parquet files and MORE!

5K 4 2
NVIDIA
nvidia-dali-nightly-cuda120

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

5K 6K 663
kmatarese
glide

Easy ETL

5K 17 2
CouncilDataProject
cdp-backend

Data storage utilities and processing pipelines used by CDP instances.

4K 23 27
CEA-MetroCarac
spectroview

SPECTROview : A Tool for Spectroscopic Data Processing and Visualization.

4K 4 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery