PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Kv Cache Python Packages

Python packages with the GitHub topic kv-cache. Sorted by relevance, with stars and monthly downloads.
LMCache
lmcache

Supercharge Your LLM with the Fastest KV Cache Layer

112K 8K 1K
quantumaikr
quantcpp

LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.

40K 386 42
tanavc1
llm-autotune

Zero-config local LLM optimization for Ollama, LM Studio, and Apple Silicon MLX. Reduces TTFT by 40%, wall time for local agents by 46%, and RAM usage by 3x.

8K 24 1
Alberto-Codes
turboquant-vllm

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

7K 46 5
Omodaka9375
fade-kv

Frequency-Adaptive Decay Encoding: Attention-aware tiered KV cache compression for LLM inference.

4K 0 0
back2matching
turboquant

First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.

4K 33 7
manjunathshiva
turboquant-mlx-full

Extreme weight and KV cache compression for LLMs on Apple Silicon (MLX implementation of Google's TurboQuant)

3K 8 1
VectorArc
avp

Python SDK for Agent Vector Protocol – transfer KV-cache between LLM agents instead of text

2K 19 1
AlphaWaveSystems
tqai

TurboQuant KV cache compression for local LLM inference

1K 1 0
kddubey
cappr

Completion After Prompt Probability. Make your LLM make a choice

1K 82 3
danhicks96
prismkv

3-D Stacked-Plane KV Cache Quantizer — defensive prior art publication

494 1 1
adwantg
kvfleet

Production-grade, KV-cache-aware intelligent routing for self-hosted and hybrid LLM fleets.

455 0 0
FluffyAIcode
kakeyalattice

Discrete Kakeya cover for LLM KV cache: D4/E8 nested-lattice quantisation realising a Kakeya-style tube-cover over the direction sphere. 2.4x-2.8x compression at <1% perplexity loss on Qwen3, Llama-3, DeepSeek, GLM-4, Gemma. Drop-in transformers.DynamicCache. pip install kakeyalattice.

386 7 2
l3tchupkt
adaptq

High-performance CPU KV-cache quantization engine for LLM inference (~10× speedup, 4× memory reduction) with Python & PyTorch support.

307 1 0
Keyvanhardani
kvat

Automatic KV-Cache optimization for HuggingFace Transformers. Find the optimal cache strategy, attention backend, and dtype for your LLM inference workload.

252 1 0
LMCache
lmcache-cli

A LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios.

230 8K 1K
Argonaut790
fused-turboquant

Fused Triton encode/decode kernels for TurboQuant KV cache compression, powered by Randomized Hadamard Transform.

226 8 0
jagmarques
nexusquant-kv

Training-free KV cache compression for LLMs. 10-33x compression via E8 lattice quantization + attention-aware token eviction. One line of code.

219 13 0
Hkshoonya
spectral-kv

Up to 28x KV cache compression for LLMs via spectral SVD projection. Practically lossless on modern architectures.

188 0 0
wjddusrb03
langchain-turboquant

TurboQuant vector store for LangChain — 6x memory reduction with training-free quantization

183 1 2
vivekvar-dl
turbokv

First open-source implementation of TurboQuant (arXiv 2504.19874) — 4-7x LLM KV cache compression

163 1 0
Mmorgan-ML
phase-slip-sampler

Phase-Slip is a stochastic intervention architecture that operates on the Key-Value Cache of the model. Phase-Slip gently rotates the semantic vectors of the context window, asking the model: "How would you finish this sentence if you looked at it from a slightly different perspective?"

162 6 0
vivekvar-dl
turboquant-impl

First open-source implementation of TurboQuant (arXiv 2504.19874) — 4-7x LLM KV cache compression

133 1 0
back2matching
kvcache-bench

Benchmark every KV cache compression method on your GPU. One command, real numbers. Supports Ollama + llama.cpp.

133 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery