Dependents of xgrammar

27 dependents

Package	Description	Downloads/month
sglang	SGLang is a high-performance serving framework for large language models and mul...	287.7M
vllm	A high-throughput and memory-efficient inference and serving engine for LLMs	9.4M
vllm-tpu	A high-throughput and memory-efficient inference and serving engine for LLMs	143K
lmdeploy	A toolset for compressing, deploying and serving LLM	123K
vllm-cpu	Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe...	31K
tensorrt-llm	TensorRT LLM provides users with an easy-to-use Python API to define Large Langu...	16K
aphrodite-engine	Large-scale LLM inference engine	7K
mostlyai-engine	Synthetic Data Engine 💎	7K
vllm-ascend	Community maintained hardware plugin for vLLM on Ascend	7K
sglang-kt	SGLang is a high-performance serving framework for large language models and mul...	4K
vllm-cpu-avx512bf16	Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe...	3K
vllm-cpu-avx512vnni	vLLM CPU inference engine (AVX512 + VNNI optimized)	3K
vllm-cpu-amxbf16	Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe...	3K
vllm-cpu-avx512	vLLM CPU inference engine (AVX512 optimized)	2K
infinity-parser2	INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced...	1K
lazyllm-lmdeploy	A toolset for compressing, deploying and serving LLM	887
vllm-hust	A high-throughput and memory-efficient inference and serving engine for LLMs	437
docext	An on-premises, OCR-free unstructured data extraction, markdown conversion and b...	431
wxy-test	A high-throughput and memory-efficient inference and serving engine for LLMs	375
ai-dynamo-vllm	A high-throughput and memory-efficient inference and serving engine for LLMs	344
mmirage	Modular Multimodal Intelligent Reformatting and Augmentation Generation Engine -...	260
power-sglang-cuda124	SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support	186
vllm-emissary	A high-throughput and memory-efficient inference and serving engine for LLMs	132
vllm-test-tpu	A high-throughput and memory-efficient inference and serving engine for LLMs	80
casa-lm	Constrained sampling for language models.	70
vllm-fixed	A high-throughput and memory-efficient inference and serving engine for LLMs	42
sglang-cpu	SGLang is a fast serving framework for large language models and vision language...	2