TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs
Interactive CLI chat client for vLLM inference servers with persistent sessions and automatic context management