28 dependents
| Package | Description | Downloads/month |
|---|---|---|
| SGLang is a high-performance serving framework for large language models and mul... | 287.7M | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 9.4M | |
| Fast, Flexible and Portable Structured Generation | 6.4M | |
| FlashInfer: Kernel Library for LLM Serving | 4M | |
| 3M | ||
| A tile level programming language to generate high performance code. | 475K | |
| Fast and memory-efficient exact attention | 406K | |
| Fast and memory-efficient exact attention | 25K | |
| TensorRT LLM provides users with an easy-to-use Python API to define Large Langu... | 16K | |
| Unapologetically SM120-only CuTe DSL kernels for NVFP4 GEMM and MoE. | 13K | |
| Large-scale LLM inference engine | 7K | |
| 5K | ||
| Fast and memory-efficient exact attention | 4K | |
| SGLang is a high-performance serving framework for large language models and mul... | 4K | |
| A tile level programming language to generate high performance code. | 3K | |
| Building the Virtuous Cycle for AI-driven LLM Systems | 2K | |
| 1K | ||
| The MLIR-TensorRT JAX plugin. | 759 | |
| SGLang fork of DeepGemm | 747 | |
| FlashInfer: Kernel Library for LLM Serving | 555 | |
| Tilus is a tile-level kernel programming language with explicit control over sha... | 437 | |
| Fast and memory-efficient exact attention | 357 | |
| 343 | ||
| Fast and lightweight multimodal LLM inference engine for mobile and edge devices | 154 | |
| cuLA CUDA extension | 92 | |
| CUDA kernel library for Kestrel (Jetson PT25 backend) | 64 | |
| CUDA kernel library for Kestrel (Jetson PT24 backend) | 55 | |
| FastFlow + Apache TVM | 36 |