PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Evaluation Python Packages

Python packages with the GitHub topic evaluation. Sorted by relevance, with stars and monthly downloads.
langchain-ai
langsmith

LangSmith Client SDK Implementations

81M 871 228
mlflow
mlflow-skinny

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

38.3M 26K 6K
mlflow
mlflow

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

36.4M 26K 6K
danthedeckie
simpleeval

Simple Safe Sandboxed Extensible Expression Evaluator for Python

16.7M 595 92
mlflow
mlflow-tracing

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

16.2M 26K 6K
huggingface
evaluate

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

7.1M 2K 318
comet-ml
opik

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

5.1M 19K 1K
vibrantlabsai
ragas

Supercharge Your LLM Application Evaluations 🚀

1.4M 14K 1K
MiXaiLL76
faster-coco-eval

Continuation of an abandoned project fast-coco-eval

539K 141 11
MichaelGrupp
evo

Python package for the evaluation of odometry and SLAM

187K 4K 790
jfjlaros
spreadscript

SpreadScript: Use a spreadsheet as a function.

94K 1 0
AmenRa
ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

90K 674 31
comet-ml
opik-optimizer

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

58K 19K 1K
agenta-ai
agenta

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

56K 4K 517
cvangysel
pytrec-eval

pytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.

50K 346 36
yaroslaff
evalidate

Safe and fast evaluation of untrusted user-supplied python expressions

49K 40 4
modelscope
evalscope

A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.

47K 3K 322
thakur-nandan
sprint-toolkit

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

41K 47 2
ibm
unitxt

🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data for end-to-end AI benchmarking

34K 212 67
run-house
kubetorch

Distribute and run AI workloads on Kubernetes magically in Python, like PyTorch for ML infra.

30K 1K 57
run-house
runhouse

Distribute and run AI workloads on Kubernetes magically in Python, like PyTorch for ML infra.

27K 1K 57
huggingface
lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

23K 2K 454
foreai-co
fore

The fore client package

21K 13 1
dustalov
evalica

Evalica, your favourite evaluation toolkit

21K 62 5
    • Data from PyPI, GitHub, ClickHouse, and BigQuery