PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
apache
pyiceberg

PyIceberg

35.8M 1K 481
Eventual-Inc
daft

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

814K 5K 457
starrocks
starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

454K 12K 2K
apache
pydoris-custom

Apache Doris is an easy-to-use, high performance and unified analytics database.

215K 15K 4K
apache
pydoris

Apache Doris is an easy-to-use, high performance and unified analytics database.

93K 15K 4K
mabel-dev
opteryx

🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.

75K 112 14
lakehq
pysail

Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.

32K 2K 129
projectnessie
pynessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

26K 1K 173
mabel-dev
opteryx-core

🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.

13K 112 14
Eventual-Inc
daft-lts

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

11K 5K 457
sidequery
dlt-iceberg

An Iceberg destination for DLT that supports REST catalogs

5K 9 5
arrowjet
arrowjet

The fastest way to move data in and out of database.

5K 1 1
apache
dbt-doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

4K 15K 4K
jghoman
pyducklake

Python toolkit for working with Ducklake

3K 4 1
legout
duckalog

Build DuckDB catalogs from declarative YAML/JSON configuration files

2K 1 0
apache
redpanda-polaris-catalog-python

Apache Polaris, the interoperable, open source catalog for Apache Iceberg

1K 2K 437
rodmena-limited
datashard

Iceberg robustness, for the rest of us | S3 and Local safe file operations + Pandas support to query your data and logs.

931 4 0
goalzz85
sql2arrow

This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays.

696 7 0
srpraneeth
torch-dataloader-utils

Efficient Data Loader Utils for loading data from structured sources into Pytorch

437 0 0
slidoapp
duckberg

Python package for querying iceberg data through duckdb.

428 75 5
datacoolie
datacoolie

Metadata-driven ETL framework for portable data pipelines across Polars, Spark, Fabric, Databricks, and AWS.

364 0 0
apache
apache-polaris

Apache Polaris, the interoperable, open source catalog for Apache Iceberg

297 2K 437
jovezhong
mcp-timeplus

An MCP server for Timeplus.

258 12 5
apache
apache-iceberg

Apache Iceberg

236 9K 3K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery