Triton Python Packages

liger-kernel

Efficient Triton Kernels for LLM Training

792K 6K 526

sageattention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

156K 3K 403

liger-kernel-nightly

Efficient Triton Kernels for LLM Training

63K 6K 526

tritonparse

TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels

17K 204 26

clearml-serving

ClearML - Model-Serving Orchestration and Repository Solution

11K 164 50

turboquant-vllm

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

8K 46 5

fastdeploy

Deploy DL/ ML inference pipelines with minimal extra code.

4K 103 17

hilbertsfc

Ultra-fast 2D & 3D Hilbert curve kernels in Python. JIT compiled, branchless, L1-cache-friendly lookup tables, loop unrolling, SIMD, and multi-threading.

1K 6 1

torchgw

Fast Sampled Gromov-Wasserstein optimal transport solver — pure PyTorch, scalable, differentiable

1K 1 0

flash-sinkhorn

Sinkhorn optimal transport kernels in PyTorch + Triton (squared Euclidean, no cost matrix materialization).

1K 189 19

dlblas

DLBlas: clean and efficient kernels

902 39 12

triton-runner

Multi-Level Triton Runner supporting Python, IR, PTX, AMDGCN, cubin and hasco.

822 95 5

fdclient

fastDeploy python client

815 103 17

flag-gems

FlagGems is an operator library for large language models implemented in the Triton Language.

782 982 349

flash-sparse-attn

Trainable fast and memory-efficient sparse attention

518 637 60

triformer

Transformers components but in Triton

515 34 1

hip-attn

Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.

445 151 15

atlas-quantum

GPU-accelerated quantum tensor network simulator with adaptive MPS

331 0 0

transformersplus

Add Some plus extra features to transformers

327 0 0

grammared-language

Adding Grammarly (and other) open source ML models to LanguageTool

314 6 0

tritonllm

LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model

243 114 5

fused-turboquant

Fused Triton encode/decode kernels for TurboQuant KV cache compression, powered by Randomized Hadamard Transform.

208 8 0

quick-deploy

Optimize, convert and deploy machine learning models as fast inference API using Triton and ORT. Currently support Hugging Face transformers, PyToch, Tensorflow, SKLearn and XGBoost models.

207 6 1

ssblast

FP8 per-tile scaled linear solver for consumer NVIDIA GPUs

175 0 0

Search Packages