PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Benchmark Python Packages

Python packages with the GitHub topic benchmark. Sorted by relevance, with stars and monthly downloads.
swe-bench
swebench

SWE-bench: Can Language Models Resolve Real-world Github Issues?

39.1M 5K 849
ionelmc
pytest-benchmark

pytest fixture for benchmarking code

13.5M 1K 132
embeddings-benchmark
mteb

MTEB: Massive Text Embedding Benchmark

2.8M 3K 608
smarie
pytest-harvest

Store data created during your `pytest` tests execution, and retrieve it at the end of the session, e.g. for applicative benchmarking purposes.

457K 76 10
airspeed-velocity
asv

Airspeed Velocity: A simple Python benchmarking tool with web-based reporting

403K 998 204
cheind
motmetrics

:bar_chart: Benchmark multiple object trackers (MOT) in Python

214K 1K 262
MichaelGrupp
evo

Python package for the evaluation of odometry and SLAM

185K 4K 790
google
google-benchmark

A microbenchmark support library

117K 10K 2K
python
pyperformance

Python Performance Benchmark Suite

82K 1K 203
tarasko
picows

Ultra-fast websocket client and server for asyncio

57K 265 18
membrowse
membrowse

Track and analyze binary size and memory footprint in embedded firmware

46K 20 1
MedMNIST
medmnist

[pip install medmnist] 18x Standardized Datasets for 2D and 3D Biomedical Image Classification

42K 1K 207
beir-cellar
beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

41K 2K 243
open-mmlab
mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.

41K 8K 1K
NyanKiyoshi
pytest-django-queries

Generate performance reports from your django database performance tests.

32K 83 2
ethz-spylab
agentdojo

A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.

23K 548 141
evalplus
evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

19K 2K 195
blooop
holobench

A package for benchmarking the characteristics of arbitrary functions

18K 4 3
optuna
optunahub

Python library to use and implement packages in OptunaHub

18K 55 15
EvolvingLMMs-Lab
lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

17K 4K 578
huggingface
optimum-benchmark

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

13K 336 58
Ceyron
apebench

[Neurips 2024] A benchmark suite for autoregressive neural emulation of PDEs. (≥46 PDEs in 1D, 2D, 3D; Differentiable Physics; Unrolled Training; Rollout Metrics)

13K 100 2
logpai
logparser3

A machine learning toolkit for log parsing [ICSE'19, DSN'16]

12K 2K 580
Basaltlabs-app
gauntlet-cli

Behavioral reliability under pressure. Test how LLMs behave when things get hard.

11K 6 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery