PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Parquet Python Packages

Python packages with the GitHub topic parquet. Sorted by relevance, with stars and monthly downloads.
apache
pyarrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

398.6M 17K 4K
Eventual-Inc
daft

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

833K 5K 457
InfluxCommunity
influxdb3-python

Python module that provides a simple and convenient way to interact with InfluxDB 3.0.

391K 99 17
scikit-hep
awkward0

Manipulate arrays of complex data structures as easily as Numpy.

326K 214 39
uber
petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

290K 2K 284
cldellow
parquet-metadata

Dump metadata about a Parquet file.

206K 11 2
ktrueda
parquet-tools

easy install parquet-tools

123K 183 24
developmentseed
lonboard

Fast, interactive geospatial data visualization in Jupyter.

38K 940 52
dask-contrib
dask-deltatable

A Delta Lake reader for Dask

37K 54 17
lakehq
pysail

Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.

34K 2K 129
quiltdata
quilt3

Quilt is a Scientific Data Management Platform on AWS that helps teams and AI find, trust, and reuse data through deeply versioned, context-rich data packages.

31K 1K 90
andreax79
airflow-provider-xlsx

Airflow operators for converting XLSX files from/to Parquet/CSV/JSON

21K 7 1
godalida
koala-diff

High-performance data diff tool in Rust.

15K 4 0
zachspar
parquet-py

A simple command-line interface & Python API for parquet

13K 1 0
Eventual-Inc
daft-lts

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

11K 5K 457
andree0
fast-xml-flattener

Fast XML flattening library with Python bindings

8K 3 0
MorePET
mat-vis-client

Pure Python client for mat-vis PBR textures — HTTP range reads, zero deps

6K 0 0
paradigmxyz
cryo

cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes

6K 2K 182
lmmx
polars-config-meta

A Polars plugin for persistent DataFrame-level metadata

6K 20 2
RecordEvolution
imctermite

Enables extraction of measurement data from binary files with extension 'raw' used by proprietary software imcFAMOS/imcSTUDIO and facilitates its storage in open source file formats

6K 33 11
SouravRoy-ETL
slothdb

SlothDB is an embedded SQL database that runs everywhere: on your laptop, on a server, and in the browser. Built from scratch. Up to 5x faster where it counts.

6K 418 3
arrowjet
arrowjet

The fastest way to move data in and out of database.

5K 1 1
OpenDataLab
vis3

Data browser based on s3. 一个基于 S3 的数据(json / jsonl / parquet / html / md等)可视化工具。👇 Try online.

5K 84 14
abdubakr77
deepcsv

Automatically processes data files in directories, converts array-like strings to NumPy arrays, detects and fixes data type issues, and saves results as optimized Parquet files and MORE!

4K 4 2
    • Data from PyPI, GitHub, ClickHouse, and BigQuery