PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
apache
pyarrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

391.2M 17K 4K
Eventual-Inc
daft

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

814K 5K 457
InfluxCommunity
influxdb3-python

Python module that provides a simple and convenient way to interact with InfluxDB 3.0.

384K 99 17
scikit-hep
awkward0

Manipulate arrays of complex data structures as easily as Numpy.

331K 214 39
uber
petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

287K 2K 284
cldellow
parquet-metadata

Dump metadata about a Parquet file.

210K 11 2
ktrueda
parquet-tools

easy install parquet-tools

120K 183 24
developmentseed
lonboard

Fast, interactive geospatial data visualization in Jupyter.

38K 940 52
dask-contrib
dask-deltatable

A Delta Lake reader for Dask

37K 54 17
lakehq
pysail

Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.

32K 2K 129
quiltdata
quilt3

Quilt is a Scientific Data Management Platform on AWS that helps teams and AI find, trust, and reuse data through deeply versioned, context-rich data packages.

31K 1K 90
andreax79
airflow-provider-xlsx

Airflow operators for converting XLSX files from/to Parquet/CSV/JSON

21K 7 1
godalida
koala-diff

High-performance data diff tool in Rust.

15K 4 0
zachspar
parquet-py

A simple command-line interface & Python API for parquet

14K 1 0
Eventual-Inc
daft-lts

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

11K 5K 457
andree0
fast-xml-flattener

Fast XML flattening library with Python bindings

8K 3 0
paradigmxyz
cryo

cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes

6K 2K 182
lmmx
polars-config-meta

A Polars plugin for persistent DataFrame-level metadata

6K 20 2
RecordEvolution
imctermite

Enables extraction of measurement data from binary files with extension 'raw' used by proprietary software imcFAMOS/imcSTUDIO and facilitates its storage in open source file formats

6K 33 11
SouravRoy-ETL
slothdb

SlothDB is an embedded SQL database that runs everywhere: on your laptop, on a server, and in the browser. Built from scratch. Up to 5x faster where it counts.

5K 418 3
abdubakr77
deepcsv

Automatically processes data files in directories, converts array-like strings to NumPy arrays, detects and fixes data type issues, and saves results as optimized Parquet files and MORE!

5K 4 2
arrowjet
arrowjet

The fastest way to move data in and out of database.

5K 1 1
OpenDataLab
vis3

Data browser based on s3. 一个基于 S3 的数据(json / jsonl / parquet / html / md等)可视化工具。👇 Try online.

5K 84 14
mabel-dev
rugo

Parquet Metadata Reader

4K 3 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery