13 dependents
Package Description Downloads/month
3M
A tile level programming language to generate high performance code. 475K
Fast and memory-efficient exact attention 406K
Fast and memory-efficient exact attention 25K
TensorRT LLM provides users with an easy-to-use Python API to define Large Langu... 16K
CUDA kernel library for Kestrel 13K
a fast, efficient inference engine for moondream 11K
Fast and memory-efficient exact attention 4K
Dao-AILab fa4
Fast and memory-efficient exact attention 357
NVIDIA SOL ExecBench - GPU kernel evaluation framework 294
Diligent framework for python 293
CUDA kernel library for Kestrel (Jetson PT25 backend) 64
CUDA kernel library for Kestrel (Jetson PT24 backend) 55