Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".
Package for gradient accumulation in TensorFlow
NFNets, PyTorch