From Systems to Frontier ML

I — Foundations Refreshed

01 Vectors, Dot Products & Norms Everything downstream — attention, embeddings, quantization, similarity search — is built from one operation you already know. Let's re-own it, precisely, and end with it running in hardware. draft
02 Matrices as transformations Matrices as functions, not number grids. Matmul as composition. The three independent axes. Orthogonal/rotation matrices. draft
03 Floating point, integers & quantization error IEEE-754 refreshed, fixed-point and integer arithmetic, where quantization error comes from. Kernel: int8 dot with _mm_maddubs_epi16. draft
04 Calculus & gradients refreshed Derivatives as sensitivity, the chain rule, Jacobians — quiet setup for backprop. draft

II — Probability, Geometry & Learning

III — The Neural Network, Assembled

IV — What Makes an LLM

V — The Systems That Run Them

VI — The Frontier

S — Spotlight: production accelerators

99 Speculative decoding & multi-token prediction How modern LLM runtimes (llama.cpp, vLLM, TensorRT-LLM) sustain a 2–3× throughput boost without retraining the model. The math of speculative decoding, the Medusa / EAGLE / MTP family of "extra heads," and a walk through llama.cpp PR #22673. draft