PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
narwhals-dev
narwhals

Lightweight and extensible compatibility layer between dataframe libraries!

82.4M 2K 190
zen-xu
pyarrow-stubs

Type annotations for pyarrow

3.2M 50 24
ibis-project
ibis-framework

the portable Python dataframe library

1.9M 7K 716
uber
petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

287K 2K 284
andree0
fast-xml-flattener

Fast XML flattening library with Python bindings

8K 3 0
shloktech
keyedstablehash

Stable, keyed hashing for Python objects and columnar data. Think `stablehash`, but with SipHash-like keyed PRF semantics so hashes are deterministic for a given key and resistant to adversarial inputs.

7K 1 0
vertti
daffy

Lightweight DataFrame validation decorators for Pandas, Polars, Modin, and PyArrow. No custom types required.

4K 58 5
legout
fsspec-utils

Enhanced utilities and extensions for fsspec filesystems with multi-format I/O support

2K 2 0
rebase-energy
timedatamodel

A lightweight data model for time series data with pandas, numpy, and polars support

1K 8 2
pmgraham
datagrunt

Datagrunt is a Python library designed to simplify the way you work with CSV files. It provides a streamlined approach to reading, processing, and transforming your data into various formats, making data manipulation efficient and intuitive.

1K 10 2
ismailhammounou
db2ixf

db2ixf is a python package with a CLI that simplifies the parsing and processing of IBM Integration eXchange Format (IXF) files.

1K 16 1
trustedshops-public
schema2pyarrow

Converts AsyncApi and JsonSchema to PyArrow schema

889 12 0
terrylica
exness-data-preprocess

Professional forex tick data preprocessing with unified DuckDB storage, Phase7 OHLC schema, and sub-15ms query performance

801 4 0
goalzz85
sql2arrow

This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays.

696 7 0
thread53
pqviewer

View Apache Parquet Files In Your Terminal

664 21 0
icaropires
pdf2dataset

Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features

596 19 5
jaysnm
dremio-arrow

Dremio Arrow Flight Client

537 4 4
stefur
swemaps

Maps of Sweden in GeoParquet

476 2 1
Genentech
pysummaries

Generate beautiful summary tables from pandas, polars or pyarrow dataframes

388 33 3
xbrianh
xdlake

A loose implementation of the deltalake protocol, written in Python on top of pyarrow, focused on extensibility, customizability, and distributed data.

339 4 0
ibis-project
turntable-spoonbill

the portable Python dataframe library

311 7K 716
itsbigspark
pymetagen

Metadata Generator

289 0 0
psmyth94
biosets

A bioinformatics extension of 🤗 Datasets library, built for ML applications on biological and omics data, offering easy integration of metadata and low-code data management tools.

256 3 0
SaelKimberly
rxls

Reading both XLSX and XLSB files, fast and memory-safe, with Python, into PyArrow

254 12 3
    • Data from PyPI, GitHub, ClickHouse, and BigQuery