Iceberg Python Packages | PyPI Stats

pyiceberg

PyIceberg

35.8M 1K 481

daft

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

814K 5K 457

starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

454K 12K 2K

pydoris-custom

Apache Doris is an easy-to-use, high performance and unified analytics database.

215K 15K 4K

pydoris

Apache Doris is an easy-to-use, high performance and unified analytics database.

93K 15K 4K

opteryx

🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.

75K 112 14

pysail

Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.

32K 2K 129

pynessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

26K 1K 173

opteryx-core

🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.

13K 112 14

daft-lts

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

11K 5K 457

dlt-iceberg

An Iceberg destination for DLT that supports REST catalogs

5K 9 5

arrowjet

The fastest way to move data in and out of database.

5K 1 1

dbt-doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

4K 15K 4K

pyducklake

Python toolkit for working with Ducklake

3K 4 1

duckalog

Build DuckDB catalogs from declarative YAML/JSON configuration files

2K 1 0

redpanda-polaris-catalog-python

Apache Polaris, the interoperable, open source catalog for Apache Iceberg

1K 2K 437

datashard

Iceberg robustness, for the rest of us | S3 and Local safe file operations + Pandas support to query your data and logs.

931 4 0

sql2arrow

This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays.

696 7 0

torch-dataloader-utils

Efficient Data Loader Utils for loading data from structured sources into Pytorch

437 0 0

duckberg

Python package for querying iceberg data through duckdb.

428 75 5

datacoolie

Metadata-driven ETL framework for portable data pipelines across Polars, Spark, Fabric, Databricks, and AWS.

364 0 0

apache-polaris

Apache Polaris, the interoperable, open source catalog for Apache Iceberg

297 2K 437

mcp-timeplus

An MCP server for Timeplus.

258 12 5

apache-iceberg

Apache Iceberg

236 9K 3K