Adaptive inference budget controller for self-hosted LLMs. Controls thinking tokens, tracks GPU cost per query.