32 dependents
| Package | Description | Downloads/month |
|---|---|---|
| A high-throughput and memory-efficient inference and serving engine for LLMs | 9.4M | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 143K | |
| Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... | 31K | |
| vLLM-like inference for Apple Silicon - GPU-accelerated Text, Image, Video & Aud... | 10K | |
| Large-scale LLM inference engine | 7K | |
| Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... | 3K | |
| An opinionated Llama Server engine with a focus on agentic tasks | 3K | |
| vLLM CPU inference engine (AVX512 + VNNI optimized) | 3K | |
| Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... | 3K | |
| vLLM CPU inference engine (AVX512 optimized) | 2K | |
| General Information, model certifications, and benchmarks for nm-vllm enterprise... | 666 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 437 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 375 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 345 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 344 | |
| Python functions backed by language models | 338 | |
| tools for detecting bias patterns of LLMs | 234 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 209 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 176 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 132 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 115 | |
| structre context for code project | 110 | |
| structre context for code project | 105 | |
| llama-index prompts utils lmformatenforcer integration | 95 | |
| llama-index prompts lmformatenforcer integration | 94 | |
| llama-index prompts lmformatenforcer utils integration | 92 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 80 | |
| llama-index prompts lmformatenforcer integration | 66 | |
| llama-index prompts utils lmformatenforcer integration | 65 | |
| Add your description here | 59 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 42 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 41 |