From Weight Decay to Hyperball Optimization

Weight decay is often described as capacity control, but in modern scale-invariant architectures it instead sets the effective step size. Xingyu, Kaifeng, Tengyu, Percy, and I put together a full-length interactive article that walks through the math, demos, and Hyperball — an optimizer that removes weight decay entirely by constraining norms directly.

All of the interactive plots, sliders, and citations live in the standalone build below. You can read it inline or pop it out into a new tab if you want a full-screen view.