24 dependents
Package Description Downloads/month
SGLang is a high-performance serving framework for large language models and mul... 287.7M
A high-throughput and memory-efficient inference and serving engine for LLMs 9.4M
Transformers-compatible library for applying various compression algorithms to L... 285K
A high-throughput and memory-efficient inference and serving engine for LLMs 143K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 31K
Large-scale LLM inference engine 7K
Community maintained hardware plugin for vLLM on Ascend 7K
SGLang is a high-performance serving framework for large language models and mul... 4K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 3K
vLLM CPU inference engine (AVX512 + VNNI optimized) 3K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 3K
vLLM CPU inference engine (AVX512 optimized) 2K
Simple local web UI for captioning image/video datasets with optional local VLM ... 487
vLLM Kunlun3 backend plugin 464
A high-throughput and memory-efficient inference and serving engine for LLMs 437
A high-throughput and memory-efficient inference and serving engine for LLMs 375
A high-throughput and memory-efficient inference and serving engine for LLMs 344
Modular Multimodal Intelligent Reformatting and Augmentation Generation Engine -... 260
SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support 186
A high-throughput and memory-efficient inference and serving engine for LLMs 132
A high-throughput and memory-efficient inference and serving engine for LLMs 115
A high-throughput and memory-efficient inference and serving engine for LLMs 80
A high-throughput and memory-efficient inference and serving engine for LLMs 42
SGLang is a fast serving framework for large language models and vision language... 2