Calculus & gradients refreshed

§1 Derivatives as sensitivity
A derivative is sensitivity — "how much does the output move when the input nudges?" The numerical version has a famous U-curve of error that lands exactly where IEEE-754 says it should.
§2 Partials, gradient, Jacobian
The gradient is a vector pointing uphill; the Jacobian is a matrix that's the best linear approximation of a vector-valued function near a point. Both generalise §1's derivative to many inputs and many outputs.
§3 The chain rule — engine of backprop
Composing functions composes their Jacobians by matrix multiplication. Walking that product right-to-left is forward mode; left-to-right is reverse mode (backprop). The whole architecture of training neural networks falls out of this one rule.