Attention Is All You Need Python Packages

whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

54K 3K 210

yunchang

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

38K 666 79

qwen

My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't released model code yet sooo...

4K 12 2

sockeye

Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch

3K 1K 320

crabnet

Predict materials properties using only the composition information!

2K 17 7

longnet

Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"

1K 718 61

mambatransformer

Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling

518 220 16

alr-transformer

Zeta implemantion of "Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers"

373 13 1

tiny-gptv

Simple Implementation of TinyGPTV in super simple Zeta lego blocks

367 16 0

nonlinear-transformer

An Implementation of an Transformer model that generates tokens non-linearly all at once like the heptapods from Arrival

362 10 0

mgqa

The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints"

285 16 1

screenai

Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"

274 383 37

autort-swarms

Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"

258 43 3

video-vit

Paper - Pytorch

244 11 0

liquidnet

Implementation of Liquid Nets in Pytorch

231 71 11

audio-flamingo

Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities"

224 40 1

m2pt

Implementation of M2PT in PyTorch from the paper: "Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities"

206 14 1

mamba-former

Paper - Pytorch

191 21 1

mobilevlm

Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices"

189 15 0

cross-attn

The open source implementation of the cross attention mechanism from the paper: "JOINTLY TRAINING LARGE AUTOREGRESSIVE MULTIMODAL MODELS"

186 37 1

simple-transformer

A simple and modular implementation of the Transformer model in PyTorch

176 4 0

jamba

jamba - Pytorch

174 214 15

hlt-torch

Paper - Pytorch

166 62 9

kosmos2-torch

Kosmos - Pytorch

164 74 6