-
Fantastic Pretraining Optimizers and Where to Find Them 2.2: The Hitchhiker's Guide to the Weight Norm Theory
Why weight decay sets the effective step size, what weight norms are really doing, and how that theory motivates Hyperball.
-
Fantastic Pretraining Optimizers and Where to Find Them 2.1: Hyperball Optimization
Hyperball optimization, norm-constrained updates, and why explicit weight-norm control can speed up pretraining.