III — The Neural Network, Assembled → Chapter 12
FROM SYSTEMS TO FRONTIER ML

Softmax & the exponential family

Smooth argmax, numerical stability (the max-subtraction trick), online/streaming softmax. Kernel: vectorized stable softmax.

§1 Softmax — properties + numerical stability §2 Cross-entropy, KL divergence, the loss landscape §3 Online softmax — the FlashAttention key

← ALL CHAPTERS