PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
Basaltlabs-app
gauntlet-cli

Behavioral reliability under pressure. Test how LLMs behave when things get hard.

10K 6 0
debiai
debiai-gui

Bias detection and contextual evaluation tool for your AI projects

3K 30 5
star-whale
starwhale-bootstrap

an MLOps/LLMOps platform

2K 238 39
star-whale
starwhale

an MLOps/LLMOps platform

2K 238 39
metno
pyaerocom

Python tools for climate and air quality model evaluation

2K 31 15
oaslananka
fovux-mcp

Local-first YOLO workbench for edge-AI computer vision: MCP server + VS Code Studio for dataset inspection, training, evaluation, export, and RTSP inference.

2K 1 0
stef41
modeldiffx

Model behavioral diffing - compare LLM outputs across versions, detect regressions.

1K 1 0
evaluation-context-protocol
ecp-runtime

ECP is a standardized interface for orchestrating, auditing, and enforcing authority limits in AI Agent evaluations. It moves evaluation from "brittle Python scripts" to a deterministic infrastructure protocol

1K 8 1
evaluation-context-protocol
ecp-sdk

ECP is a standardized interface for orchestrating, auditing, and enforcing authority limits in AI Agent evaluations. It moves evaluation from "brittle Python scripts" to a deterministic infrastructure protocol

1K 8 1
Khanz9664
trustlens

Open-source Python library for evaluating ML model reliability beyond accuracy — with calibration, failure, and fairness diagnostics for informed deployment decisions.

995 10 12
ankurpand3y
judicator

Who evaluates the evaluator? Judicator audits LLM-as-a-Judge systems for 7 documented bias types. Zero config. Works with any LLM.

973 5 1
burning-cost
insurance-cv

Temporal and distributional cross-validation, and feature screening, for insurance pricing models

539 0 0
wmjg-alt
easymlselector

A model selection process for Machine Learning tasks on subset of training sample

499 0 0
daniel-yj-yang
machlearn

A Simple Yet Powerful Machine Learning Python Library

400 1 0
Ricardouchub
evalcards

Librería Python para generar reportes de evaluación (clasificación, regresión, forecasting) con métricas y gráficos listos en Markdown, JSON y pronto HTML.

342 1 0
animator
titus2

Titus 2 : Portable Format for Analytics (PFA) implementation for Python 3.4+

323 24 2
metriculous-ml
metriculous

Measure and visualize machine learning model performance without the usual boilerplate.

231 98 11
medoidai
skrobot

skrobot is a Python module for designing, running and tracking Machine Learning experiments / tasks. It is built on top of scikit-learn framework.

218 24 2
Lantianzz
scorecardbundle

A High-level Scorecard Modeling API | 评分卡建模尽在于此

217 83 30
Rowusuduah
llm-sentry

Unified AI Reliability Platform. One install, 12 diagnostic engines. Zero-dependency LLM pipeline monitoring.

215 0 0
vAndrewKarma
data-science-snippets

A modular set of data science utilities for EDA, cleaning, and more.

190 3 2
luna-ml
luna-ml

Luna ML - ML Leaderboard for your team with automatic model evaluation

189 4 1
vAndrewKarma
utils-axn-2237

🧰 Essential EDA and Data Cleaning Helpers for Any DataFrame This collection of functions is designed to accelerate exploratory data analysis (EDA), quickly surface data quality issues, and offer high-level insights into the structure and content of your dataset.

179 3 2
SubaashNair
smartpredict

An advanced machine learning library designed to simplify model training, evaluation, and selection.

172 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery