Zero-config local LLM optimization for Ollama, LM Studio, and Apple Silicon MLX. Reduces TTFT by 40%, wall time for local agents by 46%, and RAM usage by 3x.
LLM inference benchmarking toolkit. Measure TTFT, inter-token latency, throughput, and P50–P99 across concurrency levels.
The only voice agent context manager with a TTFT feedback loop