Datalake Python Packages

lakefs-sdk

lakeFS - Data version control for your data lake | Git for data

1.1M 5K 446

lakefs

lakeFS - Data version control for your data lake | Git for data

920K 5K 446

starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

454K 12K 2K

deeplake

Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.

228K 9K 709

lakefs-client

lakeFS - Data version control for your data lake | Git for data

211K 5K 446

pandasai

Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

180K 24K 2K

pandasai-openai

Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

13K 24K 2K

pan-cortex-data-lake

Python idiomatic SDK for Cortex™ Data Lake.

11K 48 22

pandasai-litellm

Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

7K 24K 2K

dlt-iceberg

An Iceberg destination for DLT that supports REST catalogs

5K 9 5

zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

4K 1K 165

pancloud

Python idiomatic SDK for Cortex™ Data Lake.

2K 48 22

apache-gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

2K 3K 818

tarn

An insanely customizable framework for key-value storage 💾

2K 3 0

aws-orbit-overprovisioning

A Data Platform built for AWS, powered by Kubernetes.

1K 147 92

dm-job-lib

Data Manager Job Library

1K 10 2

aws-insurancelake-etl

This solution helps you deploy ETL processes and data storage resources to create an Insurance Lake using Amazon S3 buckets for storage, AWS Glue for data transformation, and AWS CDK Pipelines. It is originally based on the AWS blog Deploy data lake ETL jobs using CDK Pipelines, and complements the InsuranceLake Infrastructure project

975 35 16

aws-orbit

A Data Platform built for AWS, powered by Kubernetes.

892 147 92

aws-orbit-custom-cfn

A Data Platform built for AWS, powered by Kubernetes.

823 147 92

aws-orbit-redshift

A Data Platform built for AWS, powered by Kubernetes.

806 147 92

aws-orbit-sdk

A Data Platform built for AWS, powered by Kubernetes.

801 147 92

aws-orbit-team-script-launcher

A Data Platform built for AWS, powered by Kubernetes.

794 147 92

aws-orbit-code-commit

A Data Platform built for AWS, powered by Kubernetes.

791 147 92

aws-orbit-hello-world

A Data Platform built for AWS, powered by Kubernetes.

791 147 92

Search Packages