Dependents of compressed-tensors

24 dependents

Package	Description	Downloads/month
sglang	SGLang is a high-performance serving framework for large language models and mul...	287.7M
vllm	A high-throughput and memory-efficient inference and serving engine for LLMs	9.4M
llmcompressor	Transformers-compatible library for applying various compression algorithms to L...	285K
vllm-tpu	A high-throughput and memory-efficient inference and serving engine for LLMs	143K
vllm-cpu	Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe...	31K
aphrodite-engine	Large-scale LLM inference engine	7K
vllm-ascend	Community maintained hardware plugin for vLLM on Ascend	7K
sglang-kt	SGLang is a high-performance serving framework for large language models and mul...	4K
vllm-cpu-avx512bf16	Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe...	3K
vllm-cpu-avx512vnni	vLLM CPU inference engine (AVX512 + VNNI optimized)	3K
vllm-cpu-amxbf16	Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe...	3K
vllm-cpu-avx512	vLLM CPU inference engine (AVX512 optimized)	2K
nori-captioner	Simple local web UI for captioning image/video datasets with optional local VLM ...	487
vllm-kunlun	vLLM Kunlun3 backend plugin	464
vllm-hust	A high-throughput and memory-efficient inference and serving engine for LLMs	437
wxy-test	A high-throughput and memory-efficient inference and serving engine for LLMs	375
ai-dynamo-vllm	A high-throughput and memory-efficient inference and serving engine for LLMs	344
mmirage	Modular Multimodal Intelligent Reformatting and Augmentation Generation Engine -...	260
power-sglang-cuda124	SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support	186
vllm-emissary	A high-throughput and memory-efficient inference and serving engine for LLMs	132
vllm-usf	A high-throughput and memory-efficient inference and serving engine for LLMs	115
vllm-test-tpu	A high-throughput and memory-efficient inference and serving engine for LLMs	80
vllm-fixed	A high-throughput and memory-efficient inference and serving engine for LLMs	42
sglang-cpu	SGLang is a fast serving framework for large language models and vision language...	2