Humaneval Python Packages

litebench

A pip-installable benchmark runner for LLMs and agents. Five minutes to your first eval.

1K 0 0

s0-tuning

Tune the initial recurrent state of hybrid models. Zero inference overhead.

280 4 1

gguf-humaneval-benchmark

A strict, auditable HumanEval benchmark runner for GGUF models served via llama.cpp, using its OpenAI-compatible HTTP API.

133 12 1

Search Packages