PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
delta-io
delta-spark

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

36.1M 9K 2K
delta-io
deltalake

A native Rust library for Delta Lake, with bindings into Python

22.7M 3K 615
delta-io
delta-sharing

An open protocol for secure data sharing

1.4M 938 225
Nike-Inc
koheesio

Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

817K 652 39
starrocks
starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

454K 12K 2K
apache
pydoris-custom

Apache Doris is an easy-to-use, high performance and unified analytics database.

215K 15K 4K
apache
pydoris

Apache Doris is an easy-to-use, high performance and unified analytics database.

93K 15K 4K
delta-io
hops-deltalake

A native Rust library for Delta Lake, with bindings into Python

59K 3K 615
dask-contrib
dask-deltatable

A Delta Lake reader for Dask

37K 54 17
lakehq
pysail

Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.

32K 2K 129
HsiehShuJeng
cdk-emrserverless-with-delta-lake

This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you could also launch an EMR notebook via cluster template to check the outcome from the EMR Serverless application.

17K 11 5
jeppe742
delta-lake-reader

Read Delta tables without any Spark

12K 47 14
apache
dbt-doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

4K 15K 4K
adidas
lakehouse-engine

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

4K 288 50
roapi
roapi

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

3K 3K 211
roapi
roapi-http

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

2K 3K 211
legout
duckalog

Build DuckDB catalogs from declarative YAML/JSON configuration files

2K 1 0
oyvinrog
sqlshell

A powerful SQL shell with GUI interface for data analysis

1K 1 1
roapi
columnq-cli

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

922 3K 211
PFund-Software-Ltd
pfeed

Data Engine for Manual/Algo Trading: Download/Stream -> Clean -> Store. Supports Data Lakehouse Architecture. Clean Once and Forget.

905 31 7
datacoolie
datacoolie

Metadata-driven ETL framework for portable data pipelines across Polars, Spark, Fabric, Databricks, and AWS.

364 0 0
xbrianh
xdlake

A loose implementation of the deltalake protocol, written in Python on top of pyarrow, focused on extensibility, customizability, and distributed data.

339 4 0
datacircus
pyspark-streaming-base

This project provides an opinionated way to go about crafting Spark Structured Streaming applications with PySpark

319 5 0
kirankbs
databricks4py

Spark, Delta Lake, and Databricks utility library for Python

208 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery