A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch
コマンドラインのテキストエディタ、CAFFEE Editor
A lightweight tool for detecting and querying NVIDIA GPU architectures (SM/CC), and generating `-gencode` flags for CUDA builds
PTX Inject and Stack PTX for Python
TeraXLang - Triton Extension for LLM. As fast as FlashAttention FlashMLA, etc.
INT8 Sparse Tensor Core GEMM for PyTorch — built for Windows