Dependents of flashinfer-python

16 dependents

Package	Description	Downloads/month
sglang	SGLang is a high-performance serving framework for large language models and mul...	287.7M
vllm	A high-throughput and memory-efficient inference and serving engine for LLMs	9.4M
tensorrt-llm	TensorRT LLM provides users with an easy-to-use Python API to define Large Langu...	16K
aphrodite-engine	Large-scale LLM inference engine	7K
pruna-pro	Smash your AI models - Pro Version	6K
sglang-kt	SGLang is a high-performance serving framework for large language models and mul...	4K
flashinfer-bench	Building the Virtuous Cycle for AI-driven LLM Systems	2K
nemo-export-deploy	NeMo Export and Deploy - a library to export and deploy LLMs and MMs	1K
exllamav3-inference	Single-user optimized inference wrapper for ExLlamaV3	260
tokasaurus	The little (LLM) engine that could!	206
stindex	A multi-dimensional information extraction system that uses LLMs to extract temp...	187
power-sglang-cuda124	SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support	186
kimi-k2-optimizer	Kimi K2.5 (1.1T) optimization suite for RTX 3090 with aggressive RAM optimizatio...	138
vajra-nightly	A high-throughput and low-latency LLM inference system	125
vox-serve	A serving system for speech language models.	109
chitu	A high-performance inference framework for large language models, focusing on ef...	75