588 dependents
Package Description Downloads/month
An open-source storage framework that enables building a Lakehouse architecture ... 36.1M
2.7M
Semantic link for Microsoft Fabric 2.3M
Databricks Feature Store Client 621K
An orchestration platform for the development, production, and observation of da... 504K
PySpark schema generator 371K
protobuf pyspark conversion 294K
Petastorm library enables single machine or distributed training and evaluation ... 287K
Data Engine is a general purpose python package for data engineering. 162K
CIDP Python SDK 150K
Spark 3 plugin for flytekit 118K
Create and manipulate Tableau Hyper files from Apache Spark DataFrames and Spark... 113K
A tool for regression testing Spark Dataframes in Python 101K
Seshat python SDK is a library to help create ML data pipelines. 93K
Cloud-native genomic dataframes and batch computing 88K
A suite of tools for working with training datasets for interatomic potentials 83K
Easy tools 44K
Chronon python API library 38K
Provides a set of APIs to consume Azure Open Datasets. 36K
35K
Python PMML scoring library for PySpark as SparkML Transformer 30K
Python Wrappers for Hadoop FileSystem 26K
25K
Framework for simpler Spark Pipelines 24K
A python library for building machine learning models on Databricks using a fede... 23K
Tools that make it easier to use FHIR and clinical terminology within data analy... 19K
RayDP provides simple APIs for running Spark on Ray and integrating Spark with A... 18K
sparkql: Apache Spark SQL DataFrame schema management for sensible humans 16K
GDMO native classes for standardized interaction with data objects within Azure ... 15K
Apache Airflow connector for Ocean for Apache Spark 13K
MPC Server for PySpark inpired by the LakeSail 13K
HandySpark - bringing pandas-like capabilities to Spark dataframes 12K
A Custom Jupyter Widget Library for Power BI 11K
Wrapper for Great Expectations to fit the requirements of the Gemeente Amsterdam... 11K
DBND is an agile pipeline framework that helps data engineering teams track and ... 11K
Modular toolkit for Data Engineering with PySpark and Delta Lake — schema manage... 11K
Great Expectations Plugin for Flytekit 10K
Package containing common code and reusable components for pipelines and dags 10K
A Python helper package providing streamlined Spark functions for efficient data... 10K
An orchestration platform for the development, production, and observation of da... 10K
Spark-based modular ETL pipeline framework. 9K
sparkxgb module 9K
An orchestration platform for the development, production, and observation of da... 8K
Dataproc client library for Spark Connect 8K
An orchestration platform for the development, production, and observation of da... 8K
CLI tool for the Zipline AI platform 8K
8K
Felleskomponenter på DASK 8K
Soda SparkDF V4 6K
A cluster computing framework for processing large-scale geospatial data 6K