Sinkhorn optimal transport kernels in PyTorch + Triton (squared Euclidean, no cost matrix materialization).