Trainable fast and memory-efficient sparse attention
Flash Dynamic Mask Attention: Fast and Memory-Efficient Trainable Dynamic Mask Sparse Attention