PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
agenta-ai
agenta

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

57K 4K 517
amiable-dev
llm-council-core

Multi-LLM council system with peer review and synthesis

3K 18 7
HZYAI
ragscore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or CLI. Privacy-first, async, visual reports.

2K 31 5
MigoXLab
dingo-python

Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool

2K 691 71
haizelabs
verdict

Inference-time scaling for LLMs-as-a-judge.

2K 339 25
root-signals
scorable

The Python SDK for API of Scorable

1K 14 1
ankurpand3y
judicator

Who evaluates the evaluator? Judicator audits LLM-as-a-Judge systems for 7 documented bias types. Zero config. Works with any LLM.

973 5 1
OtherVibes
mcp-as-a-judge

MCP as a Judge is a behavioral MCP that strengthens AI coding assistants by requiring explicit LLM evaluations

879 17 9
root-signals
root-signals

Scorable SDK

777 14 1
yonahgraphics
openevalkit

Production-grade Python framework for evaluating LLM and agentic systems with traditional scorers, LLM judges (OpenAI, Anthropic, Ollama, 100+ models via LiteLLM), ensemble aggregation, and smart caching for cost-effective testing.

766 3 0
ugai
pytest-llm-rubric

Pytest plugin for semantic PASS/FAIL checks using LLM-as-a-Judge

672 0 0
asarnaout
veritail

Ecommerce search relevance evaluation tool

541 5 1
trustyai-explainability
vllm-judge

A tiny, lightweight library for LLM-as-a-Judge evaluations on vLLM-hosted models.

432 2 2
docling-project
docling-sdg

A set of tools to create synthetically-generated data from documents

417 45 17
IAAR-Shanghai
xfinder

[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

291 181 7
bassrehab
artemis-agents

Production-ready multi-agent debate framework with adaptive evaluation and safety monitoring

239 0 0
egerpaulj
llm-summary

Use an LLM to summarize paragraphs

205 0 0
root-signals
root-signals-cli

CLI for the Root Signals API

144 14 1
root-signals
scorable-cli

Scorable SDK

138 14 1
OtherVibes
iflow-mcp-hepivax-mcp-as-a-judge

MCP as a Judge is a behavioral MCP that strengthens AI coding assistants by requiring explicit LLM evaluations

101 17 9
rafaelsandroni
antibodies-rafaelsandroni

Antibodies for LLM hallucinations

97 0 0
rafaelsandroni
llm-antibodies

Antibodies for LLMs hallucinations (grouping LLM as a judge, NLI, reward models)

92 0 0
DataEval
dingo-client

Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool

49 693 71
    • Data from PyPI, GitHub, ClickHouse, and BigQuery