15 dependents
Package Description Downloads/month
A high-throughput and memory-efficient inference and serving engine for LLMs 9.4M
A high-throughput and memory-efficient inference and serving engine for LLMs 143K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 31K
Large-scale LLM inference engine 7K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 3K
vLLM CPU inference engine (AVX512 + VNNI optimized) 3K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 3K
vLLM CPU inference engine (AVX512 optimized) 2K
A high-throughput and memory-efficient inference and serving engine for LLMs 437
A high-throughput and memory-efficient inference and serving engine for LLMs 375
A high-throughput and memory-efficient inference and serving engine for LLMs 344
A high-throughput and memory-efficient inference and serving engine for LLMs 132
A high-throughput and memory-efficient inference and serving engine for LLMs 115
A high-throughput and memory-efficient inference and serving engine for LLMs 80
A high-throughput and memory-efficient inference and serving engine for LLMs 42