22 dependents
Package Description Downloads/month
SGLang is a high-performance serving framework for large language models and mul... 287.7M
A high-throughput and memory-efficient inference and serving engine for LLMs 9.4M
FlashInfer: Kernel Library for LLM Serving 4M
3M
Fast and memory-efficient exact attention 406K
Fast Polar Decomposition for Muon 105K
Fast and memory-efficient exact attention 25K
TensorRT LLM provides users with an easy-to-use Python API to define Large Langu... 16K
Unapologetically SM120-only CuTe DSL kernels for NVFP4 GEMM and MoE. 13K
Large-scale LLM inference engine 7K
Fast and memory-efficient exact attention 4K
SGLang is a high-performance serving framework for large language models and mul... 4K
4K
FlashInfer: Kernel Library for LLM Serving 555
Dao-AILab fa4
Fast and memory-efficient exact attention 357
Baseten Kernel Library 338
NVIDIA SOL ExecBench - GPU kernel evaluation framework 294
SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support 186
NCCL device-initiated API for the CuTe Python DSL 110
cuLA CUDA extension 92
CUDA kernel library for Kestrel (Jetson PT25 backend) 64
CUDA kernel library for Kestrel (Jetson PT24 backend) 55