28 dependents
Package Description Downloads/month
SGLang is a high-performance serving framework for large language models and mul... 287.7M
A high-throughput and memory-efficient inference and serving engine for LLMs 9.4M
Fast, Flexible and Portable Structured Generation 6.4M
FlashInfer: Kernel Library for LLM Serving 4M
3M
A tile level programming language to generate high performance code. 475K
Fast and memory-efficient exact attention 406K
Fast and memory-efficient exact attention 25K
TensorRT LLM provides users with an easy-to-use Python API to define Large Langu... 16K
Unapologetically SM120-only CuTe DSL kernels for NVFP4 GEMM and MoE. 13K
Large-scale LLM inference engine 7K
5K
Fast and memory-efficient exact attention 4K
SGLang is a high-performance serving framework for large language models and mul... 4K
A tile level programming language to generate high performance code. 3K
Building the Virtuous Cycle for AI-driven LLM Systems 2K
1K
The MLIR-TensorRT JAX plugin. 759
SGLang fork of DeepGemm 747
FlashInfer: Kernel Library for LLM Serving 555
Tilus is a tile-level kernel programming language with explicit control over sha... 437
Dao-AILab fa4
Fast and memory-efficient exact attention 357
343
Fast and lightweight multimodal LLM inference engine for mobile and edge devices 154
cuLA CUDA extension 92
CUDA kernel library for Kestrel (Jetson PT25 backend) 64
CUDA kernel library for Kestrel (Jetson PT24 backend) 55
FastFlow + Apache TVM 36