A non-Transformer hierarchical recurrent network with differentiable Gumbel-Softmax routing and bounded memory slots. Runs 7B+ parameter models layer-by-layer on low-budget GPUs.