Derivatives as sensitivity, the chain rule, Jacobians — quiet setup for backprop.
A derivative is sensitivity — "how much does the output move when the input nudges?" The numerical version has a famous U-curve of error that lands exactly where IEEE-754 says it should.
The gradient is a vector pointing uphill; the Jacobian is a matrix that's the best linear approximation of a vector-valued function near a point. Both generalise §1's derivative to many inputs and many outputs.
Composing functions composes their Jacobians by matrix multiplication. Walking that product right-to-left is forward mode; left-to-right is reverse mode (backprop). The whole architecture of training neural networks falls out of this one rule.
← ALL CHAPTERS