22 dependents
| Package | Description | Downloads/month |
|---|---|---|
| SGLang is a high-performance serving framework for large language models and mul... | 287.7M | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 9.4M | |
| FlashInfer: Kernel Library for LLM Serving | 4M | |
| 3M | ||
| Fast and memory-efficient exact attention | 406K | |
| Fast Polar Decomposition for Muon | 105K | |
| Fast and memory-efficient exact attention | 25K | |
| TensorRT LLM provides users with an easy-to-use Python API to define Large Langu... | 16K | |
| Unapologetically SM120-only CuTe DSL kernels for NVFP4 GEMM and MoE. | 13K | |
| Large-scale LLM inference engine | 7K | |
| Fast and memory-efficient exact attention | 4K | |
| SGLang is a high-performance serving framework for large language models and mul... | 4K | |
| 4K | ||
| FlashInfer: Kernel Library for LLM Serving | 555 | |
| Fast and memory-efficient exact attention | 357 | |
| Baseten Kernel Library | 338 | |
| NVIDIA SOL ExecBench - GPU kernel evaluation framework | 294 | |
| SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support | 186 | |
| NCCL device-initiated API for the CuTe Python DSL | 110 | |
| cuLA CUDA extension | 92 | |
| CUDA kernel library for Kestrel (Jetson PT25 backend) | 64 | |
| CUDA kernel library for Kestrel (Jetson PT24 backend) | 55 |