PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
apache
sf-hamilton

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

169K 2K 187
pathwaycom
pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

16K 63K 2K
dagworks-inc
sf-hamilton-sdk

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

15K 2K 186
legout
flowerpower

Simple Workflow Framework based on Hamilton

14K 24 1
apache
apache-hamilton

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

13K 2K 187
dotflow-io
dotflow

🎲 Dotflow turns an idea into flow! — Lightweight Python library for execution pipelines

12K 5 8
dagworks-inc
sf-hamilton-lsp

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

10K 2K 186
dagworks-inc
sf-hamilton-ui

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

10K 2K 186
Mmodarre
lakehouse-plumber

The Metadata Driven framework for Databricks Lakeflow Declarative Pipelines (formerly Delta Live Tables). Metadata framework that generates production ready Pyspark code for Lakeflow Declarative Pipelines

5K 56 11
amsdal
amsdal-glue-connections

A Python library for querying multiple databases simultaneously through a unified interface, enabling data virtualization without moving data.

3K 4 0
quintoandar
butterfree

A tool for building feature stores.

3K 318 38
socialpoint-labs
sqlbucket

Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.

2K 74 9
amsdal
amsdal-glue-core

A Python library for querying multiple databases simultaneously through a unified interface, enabling data virtualization without moving data.

2K 4 0
amsdal
amsdal-glue

A Python library for querying multiple databases simultaneously through a unified interface, enabling data virtualization without moving data.

1K 4 0
dataforgelabs
dataforge-core

DataForge helps data teams write functional transformation pipelines by leveraging software engineering principles

1K 59 2
usc-isi-i2
kgtk

Knowledge Graph Toolkit

1K 418 62
RLado
canonada

Canonada is a data science framework that helps you build production-ready streaming pipelines for data processing in Python

1K 1 2
cpzt
parade-manage

A manage module for parade

789 0 0
crate
cratedb-fivetran-destination

CrateDB Fivetran Destination

645 0 0
amsdal
amsdal-glue-sql-parser

AMSDAL Glue is a Python interface providing high-level abstraction for interacting with multiple databases simultaneously, simplifying the development and maintenance process.

599 4 0
ContextData
vector-etl

Lightweight ETL Framework for Vector Databases

574 108 19
geopython
stetl

Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.

524 88 33
pyprogrammerblog
tiny-blocks

Tiny Block Operations for Data Pipelines

509 3 0
dagworks-inc
sf-hamilton-contrib

Hamilton's user contributed shared dataflow library.

441 2K 186
    • Data from PyPI, GitHub, ClickHouse, and BigQuery