Vram Python Packages

turboquant

First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.

4K 33 7

llm-cal

LLM inference hardware calculator — architecture-aware (MLA/NSA/MoE), engine-aware (vLLM/SGLang), honest-labeled. Reads real safetensors bytes; supports 53 GPUs (NVIDIA / AMD / Huawei Ascend / 沐曦 / 昆仑芯 / 壁仞 / 寒武纪 / 海光).

1K 1 0

whichllm

Find the best LLM that runs on your hardware

824 16 2

spectral-kv

Up to 28x KV cache compression for LLMs via spectral SVD projection. Practically lossless on modern architectures.

515 0 0

gpu-memory-guard

CLI tool to check GPU VRAM before loading AI models. Prevent OOM crashes.

246 10 0

quantsim-bench

Which quantization should I use? One command benchmarks every quant level on YOUR GPU.

139 0 0

kvcache-bench

Benchmark every KV cache compression method on your GPU. One command, real numbers. Supports Ollama + llama.cpp.

122 0 0

llm-neofetch-plus

LLM-Neofetch++ is an advanced system information tool designed specifically for local LLM (Large Language Model) usage. It provides detailed hardware detection with personalized recommendations for running AI models on your system.

103 1 0

hcgk-kernel

Hardware Control GateKeeper Kernels for AI inference within frameworks.

86 0 0

Search Packages