26 dependents
Package Description Downloads/month
SGLang is a high-performance serving framework for large language models and mul... 287.7M
A high-throughput and memory-efficient inference and serving engine for LLMs 9.4M
A high-throughput and memory-efficient inference and serving engine for LLMs 143K
A guidance language for controlling large language models. 34K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 31K
TensorRT LLM provides users with an easy-to-use Python API to define Large Langu... 16K
Large-scale LLM inference engine 7K
SGLang is a high-performance serving framework for large language models and mul... 4K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 3K
vLLM CPU inference engine (AVX512 + VNNI optimized) 3K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 3K
vLLM CPU inference engine (AVX512 optimized) 2K
INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced... 1K
Guided Infilling Modeling Toolkit 713
vLLM Kunlun3 backend plugin 464
A high-throughput and memory-efficient inference and serving engine for LLMs 437
Qwen-focused MLX vision-language chat library with batched multimodal chat. 410
A high-throughput and memory-efficient inference and serving engine for LLMs 375
A high-throughput and memory-efficient inference and serving engine for LLMs 344
MCP server with tools to build lark grammars compatible with llguidance 224
SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support 186
A high-throughput and memory-efficient inference and serving engine for LLMs 132
A high-throughput and memory-efficient inference and serving engine for LLMs 80
Constrained sampling for language models. 70
A high-throughput and memory-efficient inference and serving engine for LLMs 42
SGLang is a fast serving framework for large language models and vision language... 2