Kv Cache Python Packages

lmcache

Supercharge Your LLM with the Fastest KV Cache Layer

120K 8K 1K

quantcpp

LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.

38K 386 42

turboquant-vllm

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

8K 46 5

llm-autotune

Zero-config local LLM optimization for Ollama, LM Studio, and Apple Silicon MLX. Reduces TTFT by 40%, wall time for local agents by 46%, and RAM usage by 3x.

7K 24 1

turboquant

First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.

4K 33 7

fade-kv

Frequency-Adaptive Decay Encoding: Attention-aware tiered KV cache compression for LLM inference.

4K 0 0

turboquant-mlx-full

Extreme weight and KV cache compression for LLMs on Apple Silicon (MLX implementation of Google's TurboQuant)

3K 8 1

avp

Python SDK for Agent Vector Protocol – transfer KV-cache between LLM agents instead of text

2K 19 1

tqai

TurboQuant KV cache compression for local LLM inference

2K 1 0

cappr

Completion After Prompt Probability. Make your LLM make a choice

1K 82 3

prismkv

3-D Stacked-Plane KV Cache Quantizer — defensive prior art publication

575 1 1

spectral-kv

Up to 28x KV cache compression for LLMs via spectral SVD projection. Practically lossless on modern architectures.

515 0 0

nexusquant-kv

Training-free KV cache compression for LLMs. 10-33x compression via E8 lattice quantization + attention-aware token eviction. One line of code.

476 13 0

kvfleet

Production-grade, KV-cache-aware intelligent routing for self-hosted and hybrid LLM fleets.

464 0 0

kakeyalattice

Discrete Kakeya cover for LLM KV cache: D4/E8 nested-lattice quantisation realising a Kakeya-style tube-cover over the direction sphere. 2.4x-2.8x compression at <1% perplexity loss on Qwen3, Llama-3, DeepSeek, GLM-4, Gemma. Drop-in transformers.DynamicCache. pip install kakeyalattice.

365 7 2

adaptq

High-performance CPU KV-cache quantization engine for LLM inference (~10× speedup, 4× memory reduction) with Python & PyTorch support.

290 1 0

kvat

Automatic KV-Cache optimization for HuggingFace Transformers. Find the optimal cache strategy, attention backend, and dtype for your LLM inference workload.

255 1 0

lmcache-cli

A LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios.

209 8K 1K

fused-turboquant

Fused Triton encode/decode kernels for TurboQuant KV cache compression, powered by Randomized Hadamard Transform.

208 8 0

langchain-turboquant

TurboQuant vector store for LangChain — 6x memory reduction with training-free quantization

171 1 2

turbokv

First open-source implementation of TurboQuant (arXiv 2504.19874) — 4-7x LLM KV cache compression

164 1 0

phase-slip-sampler

Phase-Slip is a stochastic intervention architecture that operates on the Key-Value Cache of the model. Phase-Slip gently rotates the semantic vectors of the context window, asking the model: "How would you finish this sentence if you looked at it from a slightly different perspective?"

160 6 0

turboquant-impl

First open-source implementation of TurboQuant (arXiv 2504.19874) — 4-7x LLM KV cache compression

122 1 0

kvcache-bench

Benchmark every KV cache compression method on your GPU. One command, real numbers. Supports Ollama + llama.cpp.

122 0 0

Search Packages