Dependents of apache-tvm-ffi

28 dependents

Package	Description	Downloads/month
sglang	SGLang is a high-performance serving framework for large language models and mul...	287.7M
vllm	A high-throughput and memory-efficient inference and serving engine for LLMs	9.4M
xgrammar	Fast, Flexible and Portable Structured Generation	6.4M
flashinfer-python	FlashInfer: Kernel Library for LLM Serving	4M
quack-kernels		3M
tilelang	A tile level programming language to generate high performance code.	475K
flash-attn-4	Fast and memory-efficient exact attention	406K
sgl-fa4	Fast and memory-efficient exact attention	25K
tensorrt-llm	TensorRT LLM provides users with an easy-to-use Python API to define Large Langu...	16K
b12x	Unapologetically SM120-only CuTe DSL kernels for NVFP4 GEMM and MoE.	13K
aphrodite-engine	Large-scale LLM inference engine	7K
jax-tvm-ffi		5K
tokenspeed-fa4	Fast and memory-efficient exact attention	4K
sglang-kt	SGLang is a high-performance serving framework for large language models and mul...	4K
tilelang-paddle	A tile level programming language to generate high performance code.	3K
flashinfer-bench	Building the Virtuous Cycle for AI-driven LLM Systems	2K
omniback		1K
mlir-tensorrt-jax	The MLIR-TensorRT JAX plugin.	759
sgl-deep-gemm	SGLang fork of DeepGemm	747
flashinfer-python-paddle	FlashInfer: Kernel Library for LLM Serving	555
tilus	Tilus is a tile-level kernel programming language with explicit control over sha...	437
fa4	Fast and memory-efficient exact attention	357
flexloopy		343
pymllm	Fast and lightweight multimodal LLM inference engine for mobile and edge devices	154
cuda-linear-attention	cuLA CUDA extension	92
kestrel-kernels-jetson-pt25	CUDA kernel library for Kestrel (Jetson PT25 backend)	64
kestrel-kernels-jetson-pt24	CUDA kernel library for Kestrel (Jetson PT24 backend)	55
fftvm	FastFlow + Apache TVM	36