Dependents of llguidance

26 dependents

Package	Description	Downloads/month
sglang	SGLang is a high-performance serving framework for large language models and mul...	287.7M
vllm	A high-throughput and memory-efficient inference and serving engine for LLMs	9.4M
vllm-tpu	A high-throughput and memory-efficient inference and serving engine for LLMs	143K
guidance	A guidance language for controlling large language models.	34K
vllm-cpu	Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe...	31K
tensorrt-llm	TensorRT LLM provides users with an easy-to-use Python API to define Large Langu...	16K
aphrodite-engine	Large-scale LLM inference engine	7K
sglang-kt	SGLang is a high-performance serving framework for large language models and mul...	4K
vllm-cpu-avx512bf16	Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe...	3K
vllm-cpu-avx512vnni	vLLM CPU inference engine (AVX512 + VNNI optimized)	3K
vllm-cpu-amxbf16	Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe...	3K
vllm-cpu-avx512	vLLM CPU inference engine (AVX512 optimized)	2K
infinity-parser2	INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced...	1K
gimkit	Guided Infilling Modeling Toolkit	713
vllm-kunlun	vLLM Kunlun3 backend plugin	464
vllm-hust	A high-throughput and memory-efficient inference and serving engine for LLMs	437
mlx-vlm-batch-outlines	Qwen-focused MLX vision-language chat library with batched multimodal chat.	410
wxy-test	A high-throughput and memory-efficient inference and serving engine for LLMs	375
ai-dynamo-vllm	A high-throughput and memory-efficient inference and serving engine for LLMs	344
guidance-lark-mcp	MCP server with tools to build lark grammars compatible with llguidance	224
power-sglang-cuda124	SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support	186
vllm-emissary	A high-throughput and memory-efficient inference and serving engine for LLMs	132
vllm-test-tpu	A high-throughput and memory-efficient inference and serving engine for LLMs	80
casa-lm	Constrained sampling for language models.	70
vllm-fixed	A high-throughput and memory-efficient inference and serving engine for LLMs	42
sglang-cpu	SGLang is a fast serving framework for large language models and vision language...	2