PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
starrocks
starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

454K 12K 2K
apache
pydoris-custom

Apache Doris is an easy-to-use, high performance and unified analytics database.

215K 15K 4K
apache
pydoris

Apache Doris is an easy-to-use, high performance and unified analytics database.

93K 15K 4K
lakehq
pysail

Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.

32K 2K 129
sdebruyn
dbt-fabric-samdebruyn

Maintained and extended fork combining dbt-fabric and dbt-fabricspark

7K 9 2
Mmodarre
lakehouse-plumber

The Metadata Driven framework for Databricks Lakeflow Declarative Pipelines (formerly Delta Live Tables). Metadata framework that generates production ready Pyspark code for Lakeflow Declarative Pipelines

5K 56 11
apache
dbt-doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

4K 15K 4K
adidas
lakehouse-engine

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

4K 288 50
datalpia
laketower

Oversee your lakehouse

3K 12 0
databendlabs
databend

Data Agent Ready Warehouse : One for Analytics, Search, AI, Python Sandbox. — rebuilt from scratch. Unified architecture on your S3.

2K 9K 870
Org-EthereaLogic
etherealogic-aetheriaforge

Databricks-native intelligent data transformation engine — coherence-scored Bronze/Silver/Gold with entity resolution and temporal reconciliation in a single deployable product.

2K 1 0
apache
apache-gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

2K 3K 818
ytsaurus
ytsaurus-spyt

YTsaurus is a scalable and fault-tolerant open-source big data platform.

2K 2K 205
apache
redpanda-polaris-catalog-python

Apache Polaris, the interoperable, open source catalog for Apache Iceberg

1K 2K 437
mwc360
lakebench

A multi-modal Python library for benchmarking Azure lakehouse engines and ELT scenarios, supporting both industry-standard and novel benchmarks.

1K 51 17
mag1cfrog
timeseries-table-format

Append-only time-series table format with gap/overlap tracking (Python bindings).

1K 12 1
IBM
ibm-watsonxdata-mcp-server

Model Context Protocol (MCP) server for IBM watsonx.data - enables AI assistants to query and explore lakehouse data Resources

1K 6 3
apache
pyfluss

Apache Fluss (incubating) Python client

1K 47 39
apache
doris-mcp-server

Enterprise-grade Model Context Protocol (MCP) server implementation for Apache Doris

644 291 79
google
space-datasets

Unified storage framework for the entire machine learning lifecycle

524 155 8
datacoolie
datacoolie

Metadata-driven ETL framework for portable data pipelines across Polars, Spark, Fabric, Databricks, and AWS.

364 0 0
apache
python-for-fluss

Python bindings for Fluss

321 47 39
apache
apache-polaris

Apache Polaris, the interoperable, open source catalog for Apache Iceberg

297 2K 437
openaleph
ftm-lakehouse

Data standard and archive storage for structured FollowTheMoney data, leaked data, private and public document collections.

284 5 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery