A 35M-parameter goldfish language model with a 10-second memory
World-Structured Feed-Forward Network: A novel FFN architecture with multi-head latent representations for language models