Inference Optimization Python Packages

turboquant-vllm

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

8K 46 5

llm-autotune

Zero-config local LLM optimization for Ollama, LM Studio, and Apple Silicon MLX. Reduces TTFT by 40%, wall time for local agents by 46%, and RAM usage by 3x.

7K 24 1

krasis

Krasis is no longer distributed via PyPI. Install from GitHub: https://github.com/brontoguana/krasis

5K 447 22

sollol

Super Ollama Load Balancer - Performance-aware routing for distributed Ollama deployments with Ray, Dask, and adaptive metrics

2K 4 2

contextpilot

Fast Long-Context Inference via Context Reuse

1K 81 6

wildedge-sdk

Python SDK for WildEdge

663 13 1

torch-quant

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

399 924 168

rabbitllm

Run 70B+ LLMs on a single 4GB GPU — no quantization required. Layer-streaming inference for consumer hardware.

344 53 9

kito

Optimize layers structure of Keras model to reduce computation time

319 157 18

thinkrouter

Cut LLM reasoning-token costs by 60% with one line of code

303 2 0

llm-autobatch

Turn single LLM calls into fast micro-batches. Rust core, Python API.

82 4 0

Search Packages