PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
mlflow
mlflow-skinny

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

38.1M 26K 6K
mlflow
mlflow

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

36M 26K 6K
mlflow
mlflow-tracing

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

16.1M 26K 6K
comet-ml
opik

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

4.8M 19K 1K
confident-ai
deepeval

The LLM Evaluation Framework

3.5M 15K 1K
Arize-ai
arize-phoenix

AI Observability & Evaluation

2.2M 10K 850
Arize-ai
arize-phoenix-otel

AI Observability & Evaluation

1.6M 10K 850
Arize-ai
arize-phoenix-client

AI Observability & Evaluation

881K 10K 850
Arize-ai
arize-phoenix-evals

AI Observability & Evaluation

762K 10K 850
Microsoft
prompty

Prompty makes it easy to create, manage, debug, and evaluate LLM prompts for your AI applications. Prompty is an asset class and format for LLM prompts designed to enhance observability, understandability, and portability for developers.

438K 1K 113
JudgmentLabs
judgeval

The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.

330K 1K 91
truera
trulens-core

Evaluation and Tracking for LLM Experiments and AI Agents

86K 3K 271
NVIDIA
garak

the LLM vulnerability scanner

73K 8K 922
comet-ml
opik-optimizer

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

60K 19K 1K
truera
trulens-otel-semconv

Evaluation and Tracking for LLM Experiments and AI Agents

59K 3K 271
truera
trulens-feedback

Evaluation and Tracking for LLM Experiments and AI Agents

58K 3K 271
agenta-ai
agenta

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

57K 4K 517
truera
trulens-dashboard

Evaluation and Tracking for LLM Experiments and AI Agents

56K 3K 271
truera
trulens-eval

Evaluation and Tracking for LLM Experiments and AI Agents

48K 3K 271
truera
trulens

Evaluation and Tracking for LLM Experiments and AI Agents

43K 3K 271
Giskard-AI
giskard

🐢 Open-Source Evaluation & Testing library for LLM Agents

40K 5K 446
truera
trulens-connectors-snowflake

Evaluation and Tracking for LLM Experiments and AI Agents

38K 3K 271
truera
trulens-providers-cortex

Evaluation and Tracking for LLM Experiments and AI Agents

22K 3K 271
EvolvingLMMs-Lab
lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

16K 4K 578
    • Data from PyPI, GitHub, ClickHouse, and BigQuery