588 dependents
| Package | Description | Downloads/month |
|---|---|---|
| An open-source storage framework that enables building a Lakehouse architecture ... | 36.1M | |
| 2.7M | ||
| Semantic link for Microsoft Fabric | 2.3M | |
| Databricks Feature Store Client | 621K | |
| An orchestration platform for the development, production, and observation of da... | 504K | |
| PySpark schema generator | 371K | |
| protobuf pyspark conversion | 294K | |
| Petastorm library enables single machine or distributed training and evaluation ... | 287K | |
| Data Engine is a general purpose python package for data engineering. | 162K | |
| CIDP Python SDK | 150K | |
| Spark 3 plugin for flytekit | 118K | |
| Create and manipulate Tableau Hyper files from Apache Spark DataFrames and Spark... | 113K | |
| A tool for regression testing Spark Dataframes in Python | 101K | |
| Seshat python SDK is a library to help create ML data pipelines. | 93K | |
| Cloud-native genomic dataframes and batch computing | 88K | |
| A suite of tools for working with training datasets for interatomic potentials | 83K | |
| Easy tools | 44K | |
| Chronon python API library | 38K | |
| Provides a set of APIs to consume Azure Open Datasets. | 36K | |
| 35K | ||
| Python PMML scoring library for PySpark as SparkML Transformer | 30K | |
| Python Wrappers for Hadoop FileSystem | 26K | |
| 25K | ||
| Framework for simpler Spark Pipelines | 24K | |
| A python library for building machine learning models on Databricks using a fede... | 23K | |
| Tools that make it easier to use FHIR and clinical terminology within data analy... | 19K | |
| RayDP provides simple APIs for running Spark on Ray and integrating Spark with A... | 18K | |
| sparkql: Apache Spark SQL DataFrame schema management for sensible humans | 16K | |
| GDMO native classes for standardized interaction with data objects within Azure ... | 15K | |
| Apache Airflow connector for Ocean for Apache Spark | 13K | |
| MPC Server for PySpark inpired by the LakeSail | 13K | |
| HandySpark - bringing pandas-like capabilities to Spark dataframes | 12K | |
| A Custom Jupyter Widget Library for Power BI | 11K | |
| Wrapper for Great Expectations to fit the requirements of the Gemeente Amsterdam... | 11K | |
| DBND is an agile pipeline framework that helps data engineering teams track and ... | 11K | |
| Modular toolkit for Data Engineering with PySpark and Delta Lake — schema manage... | 11K | |
| Great Expectations Plugin for Flytekit | 10K | |
| Package containing common code and reusable components for pipelines and dags | 10K | |
| A Python helper package providing streamlined Spark functions for efficient data... | 10K | |
| An orchestration platform for the development, production, and observation of da... | 10K | |
| Spark-based modular ETL pipeline framework. | 9K | |
| sparkxgb module | 9K | |
| An orchestration platform for the development, production, and observation of da... | 8K | |
| Dataproc client library for Spark Connect | 8K | |
| An orchestration platform for the development, production, and observation of da... | 8K | |
| CLI tool for the Zipline AI platform | 8K | |
| 8K | ||
| Felleskomponenter på DASK | 8K | |
| Soda SparkDF V4 | 6K | |
| A cluster computing framework for processing large-scale geospatial data | 6K |