PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
mlflow
mlflow-skinny

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

38.1M 26K 6K
mlflow
mlflow

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

36M 26K 6K
mlflow
mlflow-tracing

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

16.1M 26K 6K
graphframes
graphframes

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs

2.9M 1K 266
Microsoft
synapseml

Simple and Distributed Machine Learning

2.2M 5K 861
lucacanali
sparkmeasure

This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simplifies collecting, aggregating, and exporting Spark task/stage metrics, and is designed for practical use by developers and data engineers in interactive analysis, testing, and production monitoring workflows.

1.5M 821 160
graphframes
graphframes-py

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs

1.2M 1K 266
treeverse
lakefs-sdk

lakeFS - Data version control for your data lake | Git for data

1.1M 5K 446
treeverse
lakefs

lakeFS - Data version control for your data lake | Git for data

920K 5K 446
MrPowers
quinn

pyspark methods to enhance developer productivity 📣 👯 🎉

635K 687 95
zero323
pyspark-stubs

Apache (Py)Spark type annotations (stub files).

358K 118 36
treeverse
lakefs-client

lakeFS - Data version control for your data lake | Git for data

211K 5K 446
databricks
spark-sklearn

(Deprecated) Scikit-learn integration package for Apache Spark

126K 1K 224
svenkreiss
pysparkling

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

124K 271 45
lakehq
pysail

Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.

32K 2K 129
mattjw
sparkql

sparkql: Apache Spark SQL DataFrame schema management for sensible humans

16K 12 4
G-Research
fasttrackml

Experiment tracking server focused on speed and scalability

8K 117 18
graphframes
graphframes-latest

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs

5K 1K 266
abronte
pysparkgateway

Connect to remote Spark clusters seamlessly.

4K 3 4
zero323
pyspark-asyncactions

Asynchronous actions for PySpark

3K 48 2
maxpoint
spylon

Utilities to work with Scala/Java code with py4j

3K 40 17
dsgrid
dsgrid-toolkit

Python API for accessing demand-side grid model (dsgrid) datasets

3K 33 5
kubeflow
mcp-apache-spark-history-server

MCP Server and CLI for Apache Spark History Server. Debug Spark applications from AI agents, scripts, or the terminal.

2K 163 55
LucaCanali
sparkhistogram

Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark. Also tools for stress testing, measuring CPUs' performance, and I/O latency heat maps. Jupyter notebooks examples for using various DB systems.

2K 460 154
    • Data from PyPI, GitHub, ClickHouse, and BigQuery