71 dependents
| Package | Description | Downloads/month |
|---|---|---|
| 35K | ||
| Framework for simpler Spark Pipelines | 24K | |
| GDMO native classes for standardized interaction with data objects within Azure ... | 15K | |
| Wrapper for Great Expectations to fit the requirements of the Gemeente Amsterdam... | 11K | |
| Modular toolkit for Data Engineering with PySpark and Delta Lake — schema manage... | 11K | |
| Testing framework that can tests SPF library by just providing input files to se... | 5K | |
| Yet Another (Spark) ETL Framework | 5K | |
| Fabric BigQuery Data Sync Utility | 5K | |
| Program Agnostic Data Ecosystem (PADE) - Python Services | 4K | |
| Labelbox Connector for Databricks | 3K | |
| Extract FHIR data, Transform with NLP and DEID tools, and then Load FHIR data in... | 3K | |
| CDC Data Hub Lifecycle, Analysis and Visualization Accelerator Python | 3K | |
| Library to help ETL using pyspark | 2K | |
| Data Validation Engine source code | 2K | |
| A package that enables interaction with a Kobai tenant. | 2K | |
| Python version of RumbleDB | 2K | |
| Test with compare | 2K | |
| A Python library that simplifies data manipulation and workflow development with... | 1K | |
| This is package stream core of mindx | 1K | |
| Test with compare | 1K | |
| Data container | 1K | |
| 1K | ||
| Package for Fabric Engineers | 1K | |
| An Enterprise-Ready, Declarative Data Engineering Framework for Databricks Lakeh... | 1K | |
| Python library for interacting with Spark, Azure, Minio, and other data sources. | 1K | |
| An open source data science framework for feature and model deployment | 1K | |
| ECHO_modules is a Python package for analyzing US Environmental Protection Agenc... | 911 | |
| Metadata transformations for Spark | 847 | |
| A collection of interop, core, and orchestration services for the bclearer frame... | 787 | |
| spark_delta_batch for bronze > silve > gold > mart auto | 725 | |
| 719 | ||
| SDMF - Standard Data Management Framework | 719 | |
| An open-source Python library for simplifying local testing of Databricks workfl... | 685 | |
| Open-source PySpark toolkit with data sources for REST APIs and SPARQL endpoints... | 579 | |
| A suite of utilities to support data engineering workloads within an Ensono Stac... | 535 | |
| data operations related code - extended | 518 | |
| Helper files/functions/classes for generic PySpark processes | 516 | |
| A set of utilities for creating and managing ETL Pipelines with pyspark. | 502 | |
| Spark ETL Utility Framework | 379 | |
| This project provides an opinionated way to go about crafting Spark Structured S... | 319 | |
| A Python library designed to assist with implementing the Delta Mesh architectur... | 302 | |
| Production Synthetic Data Engine with Relational Integrity | 300 | |
| A modern, intuitive Python package for data lakehouse operations | 291 | |
| Distributed LLM evaluation framework for Apache Spark | 275 | |
| A modular, scalable, and Python-based Data Management Framework designed to stre... | 258 | |
| A utterly useless package that imports everything for you. Now with top 1000 PyP... | 247 | |
| Standardizex is a Python package that streamlines data standardization for Delta... | 237 | |
| A collection of tools to help when developing PySpark applications | 217 | |
| Spark, Delta Lake, and Databricks utility library for Python | 208 | |
| A data loading and transformation engine for data lakehouses | 198 |