PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Triton Python Packages

Python packages with the GitHub topic triton. Sorted by relevance, with stars and monthly downloads.
linkedin
liger-kernel

Efficient Triton Kernels for LLM Training

790K 6K 526
thu-ml
sageattention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

155K 3K 403
linkedin
liger-kernel-nightly

Efficient Triton Kernels for LLM Training

63K 6K 526
meta-pytorch
tritonparse

TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels

17K 204 26
clearml
clearml-serving

ClearML - Model-Serving Orchestration and Repository Solution

11K 164 50
Alberto-Codes
turboquant-vllm

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

7K 46 5
notAI-tech
fastdeploy

Deploy DL/ ML inference pipelines with minimal extra code.

4K 103 17
remcofl
hilbertsfc

Ultra-fast 2D & 3D Hilbert curve kernels in Python. JIT compiled, branchless, L1-cache-friendly lookup tables, loop unrolling, SIMD, and multi-threading.

2K 6 1
chansigit
torchgw

Fast Sampled Gromov-Wasserstein optimal transport solver — pure PyTorch, scalable, differentiable

1K 1 0
DeepLink-org
dlblas

DLBlas: clean and efficient kernels

961 39 12
toyaix
triton-runner

Multi-Level Triton Runner supporting Python, IR, PTX, AMDGCN, cubin and hasco.

866 95 5
flagos-ai
flag-gems

FlagGems is an operator library for large language models implemented in the Triton Language.

785 982 349
notAI-tech
fdclient

fastDeploy python client

753 103 17
dame-cell
triformer

Transformers components but in Triton

571 34 1
HKUSTDial
flash-sparse-attn

Trainable fast and memory-efficient sparse attention

534 637 60
DeepAuto-AI
hip-attn

Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.

432 151 15
msclock
transformersplus

Add Some plus extra features to transformers

407 0 0
ot-triton-lab
flash-sinkhorn

Sinkhorn optimal transport kernels in PyTorch + Triton (squared Euclidean, no cost matrix materialization).

365 189 19
followthesapper
atlas-quantum

GPU-accelerated quantum tensor network simulator with adaptive MPS

357 0 0
rayliuca
grammared-language

Adding Grammarly (and other) open source ML models to LanguageTool

346 6 0
toyaix
tritonllm

LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model

281 114 5
Argonaut790
fused-turboquant

Fused Triton encode/decode kernels for TurboQuant KV cache compression, powered by Randomized Hadamard Transform.

226 8 0
rodrigobaron
quick-deploy

Optimize, convert and deploy machine learning models as fast inference API using Triton and ORT. Currently support Hugging Face transformers, PyToch, Tensorflow, SKLearn and XGBoost models.

221 6 1
Sharveswar007
ssblast

First open-source FP8 linear solver for consumer NVIDIA GPUs — 2-3x faster than cuBLAS FP64. pip install ssblast

185 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery