PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
he-yufeng
litebench

A pip-installable benchmark runner for LLMs and agents. Five minutes to your first eval.

1K 0 0
NahuelGiudizi
llm-benchmark-toolkit

Benchmark LLMs with 10 benchmarks & 132K+ questions. 8 providers: OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Ollama, HuggingFace. Unified CLI + Web dashboard.

842 1 1
sergeyklay
factly-eval

CLI tool to evaluate LLM factuality on MMLU benchmark.

72 2 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery