27 dependents
Package Description Downloads/month
SGLang is a high-performance serving framework for large language models and mul... 287.7M
A high-throughput and memory-efficient inference and serving engine for LLMs 9.4M
A high-throughput and memory-efficient inference and serving engine for LLMs 143K
A toolset for compressing, deploying and serving LLM 123K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 31K
TensorRT LLM provides users with an easy-to-use Python API to define Large Langu... 16K
Large-scale LLM inference engine 7K
Synthetic Data Engine 💎 7K
Community maintained hardware plugin for vLLM on Ascend 7K
SGLang is a high-performance serving framework for large language models and mul... 4K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 3K
vLLM CPU inference engine (AVX512 + VNNI optimized) 3K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 3K
vLLM CPU inference engine (AVX512 optimized) 2K
INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced... 1K
A toolset for compressing, deploying and serving LLM 887
A high-throughput and memory-efficient inference and serving engine for LLMs 437
An on-premises, OCR-free unstructured data extraction, markdown conversion and b... 431
A high-throughput and memory-efficient inference and serving engine for LLMs 375
A high-throughput and memory-efficient inference and serving engine for LLMs 344
Modular Multimodal Intelligent Reformatting and Augmentation Generation Engine -... 260
SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support 186
A high-throughput and memory-efficient inference and serving engine for LLMs 132
A high-throughput and memory-efficient inference and serving engine for LLMs 80
Constrained sampling for language models. 70
A high-throughput and memory-efficient inference and serving engine for LLMs 42
SGLang is a fast serving framework for large language models and vision language... 2