71 dependents
Package Description Downloads/month
35K
Framework for simpler Spark Pipelines 24K
GDMO native classes for standardized interaction with data objects within Azure ... 15K
Wrapper for Great Expectations to fit the requirements of the Gemeente Amsterdam... 11K
Modular toolkit for Data Engineering with PySpark and Delta Lake — schema manage... 11K
Testing framework that can tests SPF library by just providing input files to se... 5K
Yet Another (Spark) ETL Framework 5K
Fabric BigQuery Data Sync Utility 5K
Program Agnostic Data Ecosystem (PADE) - Python Services 4K
Labelbox Connector for Databricks 3K
Extract FHIR data, Transform with NLP and DEID tools, and then Load FHIR data in... 3K
CDC Data Hub Lifecycle, Analysis and Visualization Accelerator Python 3K
Library to help ETL using pyspark 2K
Data Validation Engine source code 2K
A package that enables interaction with a Kobai tenant. 2K
Python version of RumbleDB 2K
Test with compare 2K
A Python library that simplifies data manipulation and workflow development with... 1K
This is package stream core of mindx 1K
Test with compare 1K
Data container 1K
1K
Package for Fabric Engineers 1K
An Enterprise-Ready, Declarative Data Engineering Framework for Databricks Lakeh... 1K
Python library for interacting with Spark, Azure, Minio, and other data sources. 1K
An open source data science framework for feature and model deployment 1K
ECHO_modules is a Python package for analyzing US Environmental Protection Agenc... 911
Metadata transformations for Spark 847
A collection of interop, core, and orchestration services for the bclearer frame... 787
spark_delta_batch for bronze > silve > gold > mart auto 725
719
SDMF - Standard Data Management Framework 719
An open-source Python library for simplifying local testing of Databricks workfl... 685
Open-source PySpark toolkit with data sources for REST APIs and SPARQL endpoints... 579
A suite of utilities to support data engineering workloads within an Ensono Stac... 535
data operations related code - extended 518
Helper files/functions/classes for generic PySpark processes 516
A set of utilities for creating and managing ETL Pipelines with pyspark. 502
Spark ETL Utility Framework 379
This project provides an opinionated way to go about crafting Spark Structured S... 319
A Python library designed to assist with implementing the Delta Mesh architectur... 302
Production Synthetic Data Engine with Relational Integrity 300
A modern, intuitive Python package for data lakehouse operations 291
Distributed LLM evaluation framework for Apache Spark 275
A modular, scalable, and Python-based Data Management Framework designed to stre... 258
A utterly useless package that imports everything for you. Now with top 1000 PyP... 247
Standardizex is a Python package that streamlines data standardization for Delta... 237
A collection of tools to help when developing PySpark applications 217
Spark, Delta Lake, and Databricks utility library for Python 208
A data loading and transformation engine for data lakehouses 198