Etl Framework Python Packages

sf-hamilton

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

169K 2K 187

pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

16K 63K 2K

sf-hamilton-sdk

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

15K 2K 186

flowerpower

Simple Workflow Framework based on Hamilton

14K 24 1

apache-hamilton

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

13K 2K 187

dotflow

🎲 Dotflow turns an idea into flow! — Lightweight Python library for execution pipelines

12K 5 8

sf-hamilton-lsp

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

10K 2K 186

sf-hamilton-ui

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

10K 2K 186

lakehouse-plumber

The Metadata Driven framework for Databricks Lakeflow Declarative Pipelines (formerly Delta Live Tables). Metadata framework that generates production ready Pyspark code for Lakeflow Declarative Pipelines

5K 56 11

amsdal-glue-connections

A Python library for querying multiple databases simultaneously through a unified interface, enabling data virtualization without moving data.

3K 4 0

butterfree

A tool for building feature stores.

3K 318 38

sqlbucket

Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.

2K 74 9

amsdal-glue-core

A Python library for querying multiple databases simultaneously through a unified interface, enabling data virtualization without moving data.

2K 4 0

amsdal-glue

A Python library for querying multiple databases simultaneously through a unified interface, enabling data virtualization without moving data.

1K 4 0

dataforge-core

DataForge helps data teams write functional transformation pipelines by leveraging software engineering principles

1K 59 2

kgtk

Knowledge Graph Toolkit

1K 418 62

canonada

Canonada is a data science framework that helps you build production-ready streaming pipelines for data processing in Python

1K 1 2

parade-manage

A manage module for parade

789 0 0

cratedb-fivetran-destination

CrateDB Fivetran Destination

645 0 0

amsdal-glue-sql-parser

AMSDAL Glue is a Python interface providing high-level abstraction for interacting with multiple databases simultaneously, simplifying the development and maintenance process.

599 4 0