PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
back2matching
turboquant

First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.

4K 33 7
FlyTOmeLight
llm-cal

LLM inference hardware calculator — architecture-aware (MLA/NSA/MoE), engine-aware (vLLM/SGLang), honest-labeled. Reads real safetensors bytes; supports 53 GPUs (NVIDIA / AMD / Huawei Ascend / 沐曦 / 昆仑芯 / 壁仞 / 寒武纪 / 海光).

1K 1 0
Andyyyy64
whichllm

Find the best LLM that runs on your hardware

824 16 2
Hkshoonya
spectral-kv

Up to 28x KV cache compression for LLMs via spectral SVD projection. Practically lossless on modern architectures.

515 0 0
CastelDazur
gpu-memory-guard

CLI tool to check GPU VRAM before loading AI models. Prevent OOM crashes.

246 10 0
back2matching
quantsim-bench

Which quantization should I use? One command benchmarks every quant level on YOUR GPU.

139 0 0
back2matching
kvcache-bench

Benchmark every KV cache compression method on your GPU. One command, real numbers. Supports Ollama + llama.cpp.

122 0 0
HFerrahoglu
llm-neofetch-plus

LLM-Neofetch++ is an advanced system information tool designed specifically for local LLM (Large Language Model) usage. It provides detailed hardware detection with personalized recommendations for running AI models on your system.

103 1 0
mnisperuza
hcgk-kernel

Hardware Control GateKeeper Kernels for AI inference within frameworks.

86 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery