PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
apache
pyspark

Apache Spark - A unified analytics engine for large-scale data processing

52.2M 43K 29K
nteract
papermill

📚 Parameterize, execute, and analyze notebooks

7.1M 6K 449
databrickslabs
dbl-tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation

3M 341 59
combust
mleap

MLeap: Deploy ML Pipelines to Production

2.8M 2K 316
apache
apache-sedona

A cluster computing framework for processing large-scale geospatial data

2.3M 2K 756
Microsoft
synapseml

Simple and Distributed Machine Learning

2.2M 5K 861
lucacanali
sparkmeasure

This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simplifies collecting, aggregating, and exporting Spark task/stage metrics, and is designed for practical use by developers and data engineers in interactive analysis, testing, and production monitoring workflows.

1.5M 821 160
apache
pyspark-client

Apache Spark - A unified analytics engine for large-scale data processing

1.4M 43K 29K
jelmerk
pyspark-hnsw

Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs

1.3M 303 59
tree-sitter
tree-sitter-scala

Scala grammar for tree-sitter

535K 184 66
G-Research
pyspark-extension

A library that provides useful extensions to Apache Spark and PySpark.

87K 236 30
h2oai
h2o-pysparkling-3-1

Sparkling Water provides H2O functionality inside Spark cluster

79K 977 361
aws
sagemaker-pyspark

A Spark library for Amazon SageMaker.

66K 301 128
pantsbuild
pantsbuild-pants

The Pants Build System

32K 4K 699
h2oai
h2o-pysparkling-3-4

Sparkling Water provides H2O functionality inside Spark cluster

26K 977 361
h2oai
h2o-pysparkling-3-5

Sparkling Water provides H2O functionality inside Spark cluster

25K 977 361
h2oai
h2o-pysparkling-2-4

Sparkling Water provides H2O functionality inside Spark cluster

19K 977 361
h2oai
h2o-pysparkling-3-3

Sparkling Water provides H2O functionality inside Spark cluster

17K 977 361
logicalclocks
hsfs

Python - Java/Scala API for the Hopsworks feature store

15K 55 42
yahoo
tensorflowonspark

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

11K 4K 940
pantsbuild
pantsbuild-pants-testutil

The Pants Build System

11K 4K 699
DataSystemsLab
geospark

A cluster computing framework for processing large-scale geospatial data

6K 2K 756
pantsbuild
pantsbuild-pants-contrib-scrooge

The Pants Build System

4K 4K 699
pantsbuild
pantsbuild-pants-contrib-go

The Pants Build System

4K 4K 699
    • Data from PyPI, GitHub, ClickHouse, and BigQuery