Delta Lake Python Packages

delta-spark

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

36.1M 9K 2K

deltalake

A native Rust library for Delta Lake, with bindings into Python

22.7M 3K 615

delta-sharing

An open protocol for secure data sharing

1.4M 938 225

koheesio

Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

817K 652 39

starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

454K 12K 2K

pydoris-custom

Apache Doris is an easy-to-use, high performance and unified analytics database.

215K 15K 4K

pydoris

Apache Doris is an easy-to-use, high performance and unified analytics database.

93K 15K 4K

hops-deltalake

A native Rust library for Delta Lake, with bindings into Python

59K 3K 615

dask-deltatable

A Delta Lake reader for Dask

37K 54 17

pysail

Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.

32K 2K 129

cdk-emrserverless-with-delta-lake

This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you could also launch an EMR notebook via cluster template to check the outcome from the EMR Serverless application.

17K 11 5

delta-lake-reader

Read Delta tables without any Spark

12K 47 14

dbt-doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

4K 15K 4K

lakehouse-engine

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

4K 288 50

roapi

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

3K 3K 211

roapi-http

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

2K 3K 211

duckalog

Build DuckDB catalogs from declarative YAML/JSON configuration files

2K 1 0

sqlshell

A powerful SQL shell with GUI interface for data analysis

1K 1 1

columnq-cli

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

922 3K 211

pfeed

Data Engine for Manual/Algo Trading: Download/Stream -> Clean -> Store. Supports Data Lakehouse Architecture. Clean Once and Forget.

905 31 7

datacoolie

Metadata-driven ETL framework for portable data pipelines across Polars, Spark, Fabric, Databricks, and AWS.

364 0 0

xdlake

A loose implementation of the deltalake protocol, written in Python on top of pyarrow, focused on extensibility, customizability, and distributed data.

339 4 0

pyspark-streaming-base

This project provides an opinionated way to go about crafting Spark Structured Streaming applications with PySpark

319 5 0

databricks4py

Spark, Delta Lake, and Databricks utility library for Python

208 0 0

Search Packages