PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
treeverse
lakefs-sdk

lakeFS - Data version control for your data lake | Git for data

1.1M 5K 446
treeverse
lakefs

lakeFS - Data version control for your data lake | Git for data

920K 5K 446
starrocks
starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

454K 12K 2K
activeloopai
deeplake

Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.

228K 9K 709
treeverse
lakefs-client

lakeFS - Data version control for your data lake | Git for data

211K 5K 446
sinaptik-ai
pandasai

Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

180K 24K 2K
sinaptik-ai
pandasai-openai

Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

13K 24K 2K
PaloAltoNetworks
pan-cortex-data-lake

Python idiomatic SDK for Cortex™ Data Lake.

11K 48 22
sinaptik-ai
pandasai-litellm

Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

7K 24K 2K
sidequery
dlt-iceberg

An Iceberg destination for DLT that supports REST catalogs

5K 9 5
zinggAI
zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

4K 1K 165
PaloAltoNetworks
pancloud

Python idiomatic SDK for Cortex™ Data Lake.

2K 48 22
apache
apache-gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

2K 3K 818
neuro-ml
tarn

An insanely customizable framework for key-value storage 💾

2K 3 0
awslabs
aws-orbit-overprovisioning

A Data Platform built for AWS, powered by Kubernetes.

1K 147 92
stonezhong
dm-job-lib

Data Manager Job Library

1K 10 2
aws-samples
aws-insurancelake-etl

This solution helps you deploy ETL processes and data storage resources to create an Insurance Lake using Amazon S3 buckets for storage, AWS Glue for data transformation, and AWS CDK Pipelines. It is originally based on the AWS blog Deploy data lake ETL jobs using CDK Pipelines, and complements the InsuranceLake Infrastructure project

975 35 16
awslabs
aws-orbit

A Data Platform built for AWS, powered by Kubernetes.

892 147 92
awslabs
aws-orbit-custom-cfn

A Data Platform built for AWS, powered by Kubernetes.

823 147 92
awslabs
aws-orbit-redshift

A Data Platform built for AWS, powered by Kubernetes.

806 147 92
awslabs
aws-orbit-sdk

A Data Platform built for AWS, powered by Kubernetes.

801 147 92
awslabs
aws-orbit-team-script-launcher

A Data Platform built for AWS, powered by Kubernetes.

794 147 92
awslabs
aws-orbit-code-commit

A Data Platform built for AWS, powered by Kubernetes.

791 147 92
awslabs
aws-orbit-hello-world

A Data Platform built for AWS, powered by Kubernetes.

791 147 92
    • Data from PyPI, GitHub, ClickHouse, and BigQuery