PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Data Lake Python Packages

Python packages with the GitHub topic data-lake. Sorted by relevance, with stars and monthly downloads.
dlt-hub
dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

7.1M 5K 498
treeverse
lakefs-sdk

lakeFS - Data version control for your data lake | Git for data

1.1M 5K 446
treeverse
lakefs

lakeFS - Data version control for your data lake | Git for data

922K 5K 446
treeverse
lakefs-client

lakeFS - Data version control for your data lake | Git for data

196K 5K 446
MatsMoll
aligned

The DBT of ML, as Aligned describes data dependencies in ML systems, and reduce technical data debt

10K 61 2
crate
dlt-cratedb

dlt destination adapter for CrateDB

6K 0 0
nodestream-proj
nodestream

A Declarative framework for Building, Maintaining, and Analyzing Graph Data

4K 61 17
Canner
wren-core-py

The open context engine for AI agents support 15+ data sources. Built on Rust and Apache DataFusion.

4K 661 197
Canner
wren-engine

The open context engine for AI agents support 15+ data sources. Built on Rust and Apache DataFusion.

3K 661 197
dlt-hub
dlt-core

dlt is an open-source python-first scalable data loading library that does not require any backend to run.

3K 5K 498
mag1cfrog
timeseries-table-format

Append-only time-series table format with gap/overlap tracking (Python bindings).

1K 12 1
nodestream-proj
nodestream-plugin-dotenv

A plugin to nodestream for loading environment variables from a .env file

643 2 0
arpe-io
lakexpress-mcp

A Model Context Protocol (MCP) server for LakeXpress, enabling database to Parquet export with sync management and data lake publishing.

389 0 0
nodestream-proj
nodestream-plugin-meta

A plugin to nodestream for building a graph of the schema of the graph.

335 0 0
nodestream-proj
nodestream-plugin-semantic

A plugin for embedding semantic data into a nodestream project

334 0 0
utndatasystems
virtual-parquet

🗜️Compressing Parquet files using functions (TRL @NeurIPS'24, EDBT Best Demo'25)

309 0 1
nodestream-proj
nodestream-plugin-pedantic

A nodestream plugin that provides a series of audits to ensure high quality and consistent nodestream projects.

290 2 0
treeverse
lakefs-sdk-async

lakeFS - Data version control for your data lake | Git for data

266 5K 446
realdatadriven
etlx-wrapper

Python wrapper for ETLX CLI to run ETL workflows from Python

170 40 3
Canner
vulcan-sql

Data API Framework for AI Agents and Data Apps

149 794 42
SRRC-1334
ztract

Extract mainframe EBCDIC data using COBOL copybooks. Zero MIPS. Pure Python + Cobrix engine.

134 0 0
dlt-hub
dlt-dataops

dlt is an open-source python-first scalable data loading library that does not require any backend to run.

91 5K 498
    • Data from PyPI, GitHub, ClickHouse, and BigQuery