Flash Attention Python Packages

ffpa-attn

FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA.

1K 277 16

flash-sinkhorn

Sinkhorn optimal transport kernels in PyTorch + Triton (squared Euclidean, no cost matrix materialization).

1K 189 19

flash-sparse-attn

Trainable fast and memory-efficient sparse attention

518 637 60

easywheels

Smart GPU wheel installer. Auto-detects CUDA, GPU, torch, and Python.

395 0 0

flashmha

An simple pytorch implementation of Flash MultiHead Attention

386 22 4

gpkg

GPU package manager — find prebuilt CUDA wheels, build missing ones

321 0 0

inf-cl

[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.

266 284 12

jax-flash-attn2

A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).

154 34 1

flash-attention-triton

Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with custom configuration mode

154 26 0

flash-dmattn

Flash Dynamic Mask Attention: Fast and Memory-Efficient Trainable Dynamic Mask Sparse Attention

134 594 54

Search Packages