Dependents of nvidia-cutlass-dsl

22 dependents

Package	Description	Downloads/month
sglang	SGLang is a high-performance serving framework for large language models and mul...	287.7M
vllm	A high-throughput and memory-efficient inference and serving engine for LLMs	9.4M
flashinfer-python	FlashInfer: Kernel Library for LLM Serving	4M
quack-kernels		3M
flash-attn-4	Fast and memory-efficient exact attention	406K
gram-newton-schulz	Fast Polar Decomposition for Muon	105K
sgl-fa4	Fast and memory-efficient exact attention	25K
tensorrt-llm	TensorRT LLM provides users with an easy-to-use Python API to define Large Langu...	16K
b12x	Unapologetically SM120-only CuTe DSL kernels for NVFP4 GEMM and MoE.	13K
aphrodite-engine	Large-scale LLM inference engine	7K
tokenspeed-fa4	Fast and memory-efficient exact attention	4K
sglang-kt	SGLang is a high-performance serving framework for large language models and mul...	4K
sonic-moe		4K
flashinfer-python-paddle	FlashInfer: Kernel Library for LLM Serving	555
fa4	Fast and memory-efficient exact attention	357
b10-kernel	Baseten Kernel Library	338
zsol-bench	NVIDIA SOL ExecBench - GPU kernel evaluation framework	294
power-sglang-cuda124	SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support	186
cute-comm	NCCL device-initiated API for the CuTe Python DSL	110
cuda-linear-attention	cuLA CUDA extension	92
kestrel-kernels-jetson-pt25	CUDA kernel library for Kestrel (Jetson PT25 backend)	64
kestrel-kernels-jetson-pt24	CUDA kernel library for Kestrel (Jetson PT24 backend)	55