PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
linkedin
liger-kernel

Efficient Triton Kernels for LLM Training

792K 6K 526
thu-ml
sageattention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

156K 3K 403
linkedin
liger-kernel-nightly

Efficient Triton Kernels for LLM Training

63K 6K 526
meta-pytorch
tritonparse

TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels

17K 204 26
clearml
clearml-serving

ClearML - Model-Serving Orchestration and Repository Solution

11K 164 50
Alberto-Codes
turboquant-vllm

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

8K 46 5
notAI-tech
fastdeploy

Deploy DL/ ML inference pipelines with minimal extra code.

4K 103 17
remcofl
hilbertsfc

Ultra-fast 2D & 3D Hilbert curve kernels in Python. JIT compiled, branchless, L1-cache-friendly lookup tables, loop unrolling, SIMD, and multi-threading.

1K 6 1
chansigit
torchgw

Fast Sampled Gromov-Wasserstein optimal transport solver — pure PyTorch, scalable, differentiable

1K 1 0
ot-triton-lab
flash-sinkhorn

Sinkhorn optimal transport kernels in PyTorch + Triton (squared Euclidean, no cost matrix materialization).

1K 189 19
DeepLink-org
dlblas

DLBlas: clean and efficient kernels

902 39 12
toyaix
triton-runner

Multi-Level Triton Runner supporting Python, IR, PTX, AMDGCN, cubin and hasco.

822 95 5
notAI-tech
fdclient

fastDeploy python client

815 103 17
flagos-ai
flag-gems

FlagGems is an operator library for large language models implemented in the Triton Language.

782 982 349
HKUSTDial
flash-sparse-attn

Trainable fast and memory-efficient sparse attention

518 637 60
dame-cell
triformer

Transformers components but in Triton

515 34 1
DeepAuto-AI
hip-attn

Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.

445 151 15
followthesapper
atlas-quantum

GPU-accelerated quantum tensor network simulator with adaptive MPS

331 0 0
msclock
transformersplus

Add Some plus extra features to transformers

327 0 0
rayliuca
grammared-language

Adding Grammarly (and other) open source ML models to LanguageTool

314 6 0
toyaix
tritonllm

LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model

243 114 5
Argonaut790
fused-turboquant

Fused Triton encode/decode kernels for TurboQuant KV cache compression, powered by Randomized Hadamard Transform.

208 8 0
rodrigobaron
quick-deploy

Optimize, convert and deploy machine learning models as fast inference API using Triton and ORT. Currently support Hugging Face transformers, PyToch, Tensorflow, SKLearn and XGBoost models.

207 6 1
Sharveswar007
ssblast

FP8 per-tile scaled linear solver for consumer NVIDIA GPUs

175 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery