Kv Cache Python Packages

lmcache

Supercharge Your LLM with the Fastest KV Cache Layer

112K 8K 1K

quantcpp

LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.

40K 386 42

llm-autotune

Zero-config local LLM optimization for Ollama, LM Studio, and Apple Silicon MLX. Reduces TTFT by 40%, wall time for local agents by 46%, and RAM usage by 3x.

8K 24 1

turboquant-vllm

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

7K 46 5

fade-kv

Frequency-Adaptive Decay Encoding: Attention-aware tiered KV cache compression for LLM inference.

4K 0 0

turboquant

First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.

4K 33 7

turboquant-mlx-full

Extreme weight and KV cache compression for LLMs on Apple Silicon (MLX implementation of Google's TurboQuant)

3K 8 1

avp

Python SDK for Agent Vector Protocol – transfer KV-cache between LLM agents instead of text

2K 19 1

tqai

TurboQuant KV cache compression for local LLM inference

1K 1 0

cappr

Completion After Prompt Probability. Make your LLM make a choice

1K 82 3

prismkv

3-D Stacked-Plane KV Cache Quantizer — defensive prior art publication

494 1 1

kvfleet

Production-grade, KV-cache-aware intelligent routing for self-hosted and hybrid LLM fleets.

455 0 0

kakeyalattice

Discrete Kakeya cover for LLM KV cache: D4/E8 nested-lattice quantisation realising a Kakeya-style tube-cover over the direction sphere. 2.4x-2.8x compression at <1% perplexity loss on Qwen3, Llama-3, DeepSeek, GLM-4, Gemma. Drop-in transformers.DynamicCache. pip install kakeyalattice.

386 7 2

adaptq

High-performance CPU KV-cache quantization engine for LLM inference (~10× speedup, 4× memory reduction) with Python & PyTorch support.

307 1 0

kvat

Automatic KV-Cache optimization for HuggingFace Transformers. Find the optimal cache strategy, attention backend, and dtype for your LLM inference workload.

252 1 0

lmcache-cli

A LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios.

230 8K 1K

fused-turboquant

Fused Triton encode/decode kernels for TurboQuant KV cache compression, powered by Randomized Hadamard Transform.

226 8 0

nexusquant-kv

Training-free KV cache compression for LLMs. 10-33x compression via E8 lattice quantization + attention-aware token eviction. One line of code.

219 13 0

spectral-kv

Up to 28x KV cache compression for LLMs via spectral SVD projection. Practically lossless on modern architectures.

188 0 0

langchain-turboquant

TurboQuant vector store for LangChain — 6x memory reduction with training-free quantization

183 1 2

turbokv

First open-source implementation of TurboQuant (arXiv 2504.19874) — 4-7x LLM KV cache compression

163 1 0

phase-slip-sampler

Phase-Slip is a stochastic intervention architecture that operates on the Key-Value Cache of the model. Phase-Slip gently rotates the semantic vectors of the context window, asking the model: "How would you finish this sentence if you looked at it from a slightly different perspective?"

162 6 0

turboquant-impl

First open-source implementation of TurboQuant (arXiv 2504.19874) — 4-7x LLM KV cache compression

133 1 0

kvcache-bench

Benchmark every KV cache compression method on your GPU. One command, real numbers. Supports Ollama + llama.cpp.

133 0 0