PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
LMCache
lmcache

Supercharge Your LLM with the Fastest KV Cache Layer

120K 8K 1K
quantumaikr
quantcpp

LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.

38K 386 42
Alberto-Codes
turboquant-vllm

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

8K 46 5
tanavc1
llm-autotune

Zero-config local LLM optimization for Ollama, LM Studio, and Apple Silicon MLX. Reduces TTFT by 40%, wall time for local agents by 46%, and RAM usage by 3x.

7K 24 1
back2matching
turboquant

First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.

4K 33 7
Omodaka9375
fade-kv

Frequency-Adaptive Decay Encoding: Attention-aware tiered KV cache compression for LLM inference.

4K 0 0
manjunathshiva
turboquant-mlx-full

Extreme weight and KV cache compression for LLMs on Apple Silicon (MLX implementation of Google's TurboQuant)

3K 8 1
VectorArc
avp

Python SDK for Agent Vector Protocol – transfer KV-cache between LLM agents instead of text

2K 19 1
AlphaWaveSystems
tqai

TurboQuant KV cache compression for local LLM inference

2K 1 0
kddubey
cappr

Completion After Prompt Probability. Make your LLM make a choice

1K 82 3
danhicks96
prismkv

3-D Stacked-Plane KV Cache Quantizer — defensive prior art publication

575 1 1
Hkshoonya
spectral-kv

Up to 28x KV cache compression for LLMs via spectral SVD projection. Practically lossless on modern architectures.

515 0 0
jagmarques
nexusquant-kv

Training-free KV cache compression for LLMs. 10-33x compression via E8 lattice quantization + attention-aware token eviction. One line of code.

476 13 0
adwantg
kvfleet

Production-grade, KV-cache-aware intelligent routing for self-hosted and hybrid LLM fleets.

464 0 0
FluffyAIcode
kakeyalattice

Discrete Kakeya cover for LLM KV cache: D4/E8 nested-lattice quantisation realising a Kakeya-style tube-cover over the direction sphere. 2.4x-2.8x compression at <1% perplexity loss on Qwen3, Llama-3, DeepSeek, GLM-4, Gemma. Drop-in transformers.DynamicCache. pip install kakeyalattice.

365 7 2
l3tchupkt
adaptq

High-performance CPU KV-cache quantization engine for LLM inference (~10× speedup, 4× memory reduction) with Python & PyTorch support.

290 1 0
Keyvanhardani
kvat

Automatic KV-Cache optimization for HuggingFace Transformers. Find the optimal cache strategy, attention backend, and dtype for your LLM inference workload.

255 1 0
LMCache
lmcache-cli

A LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios.

209 8K 1K
Argonaut790
fused-turboquant

Fused Triton encode/decode kernels for TurboQuant KV cache compression, powered by Randomized Hadamard Transform.

208 8 0
wjddusrb03
langchain-turboquant

TurboQuant vector store for LangChain — 6x memory reduction with training-free quantization

171 1 2
vivekvar-dl
turbokv

First open-source implementation of TurboQuant (arXiv 2504.19874) — 4-7x LLM KV cache compression

164 1 0
Mmorgan-ML
phase-slip-sampler

Phase-Slip is a stochastic intervention architecture that operates on the Key-Value Cache of the model. Phase-Slip gently rotates the semantic vectors of the context window, asking the model: "How would you finish this sentence if you looked at it from a slightly different perspective?"

160 6 0
vivekvar-dl
turboquant-impl

First open-source implementation of TurboQuant (arXiv 2504.19874) — 4-7x LLM KV cache compression

122 1 0
back2matching
kvcache-bench

Benchmark every KV cache compression method on your GPU. One command, real numbers. Supports Ollama + llama.cpp.

122 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery