32 dependents
Package Description Downloads/month
A high-throughput and memory-efficient inference and serving engine for LLMs 9.4M
A high-throughput and memory-efficient inference and serving engine for LLMs 143K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 31K
vLLM-like inference for Apple Silicon - GPU-accelerated Text, Image, Video & Aud... 10K
Large-scale LLM inference engine 7K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 3K
An opinionated Llama Server engine with a focus on agentic tasks 3K
vLLM CPU inference engine (AVX512 + VNNI optimized) 3K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 3K
vLLM CPU inference engine (AVX512 optimized) 2K
General Information, model certifications, and benchmarks for nm-vllm enterprise... 666
A high-throughput and memory-efficient inference and serving engine for LLMs 437
A high-throughput and memory-efficient inference and serving engine for LLMs 375
A high-throughput and memory-efficient inference and serving engine for LLMs 345
A high-throughput and memory-efficient inference and serving engine for LLMs 344
Python functions backed by language models 338
tools for detecting bias patterns of LLMs 234
A high-throughput and memory-efficient inference and serving engine for LLMs 209
A high-throughput and memory-efficient inference and serving engine for LLMs 176
A high-throughput and memory-efficient inference and serving engine for LLMs 132
A high-throughput and memory-efficient inference and serving engine for LLMs 115
structre context for code project 110
structre context for code project 105
llama-index prompts utils lmformatenforcer integration 95
llama-index prompts lmformatenforcer integration 94
llama-index prompts lmformatenforcer utils integration 92
A high-throughput and memory-efficient inference and serving engine for LLMs 80
llama-index prompts lmformatenforcer integration 66
llama-index prompts utils lmformatenforcer integration 65
Add your description here 59
A high-throughput and memory-efficient inference and serving engine for LLMs 42
A high-throughput and memory-efficient inference and serving engine for LLMs 41