Parquet Python Packages

pyarrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

391.2M 17K 4K

daft

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

814K 5K 457

influxdb3-python

Python module that provides a simple and convenient way to interact with InfluxDB 3.0.

384K 99 17

awkward0

Manipulate arrays of complex data structures as easily as Numpy.

331K 214 39

petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

287K 2K 284

parquet-metadata

Dump metadata about a Parquet file.

210K 11 2

parquet-tools

easy install parquet-tools

120K 183 24

lonboard

Fast, interactive geospatial data visualization in Jupyter.

38K 940 52

dask-deltatable

A Delta Lake reader for Dask

37K 54 17

pysail

Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.

32K 2K 129

quilt3

Quilt is a Scientific Data Management Platform on AWS that helps teams and AI find, trust, and reuse data through deeply versioned, context-rich data packages.

31K 1K 90

airflow-provider-xlsx

Airflow operators for converting XLSX files from/to Parquet/CSV/JSON

21K 7 1

koala-diff

High-performance data diff tool in Rust.

15K 4 0

parquet-py

A simple command-line interface & Python API for parquet

14K 1 0

daft-lts

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

11K 5K 457

fast-xml-flattener

Fast XML flattening library with Python bindings

8K 3 0

cryo

cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes

6K 2K 182

polars-config-meta

A Polars plugin for persistent DataFrame-level metadata

6K 20 2

imctermite

Enables extraction of measurement data from binary files with extension 'raw' used by proprietary software imcFAMOS/imcSTUDIO and facilitates its storage in open source file formats

6K 33 11

slothdb

SlothDB is an embedded SQL database that runs everywhere: on your laptop, on a server, and in the browser. Built from scratch. Up to 5x faster where it counts.

5K 418 3

deepcsv

Automatically processes data files in directories, converts array-like strings to NumPy arrays, detects and fixes data type issues, and saves results as optimized Parquet files and MORE!

5K 4 2

arrowjet

The fastest way to move data in and out of database.

5K 1 1

vis3

Data browser based on s3. 一个基于 S3 的数据（json / jsonl / parquet / html / md等）可视化工具。👇 Try online.

5K 84 14

rugo

Parquet Metadata Reader

4K 3 0

Search Packages