Norms & distance

Section 1.3

Norms & distance

A norm core term A function ‖·‖ assigning a non-negative length to a vector, obeying scaling and the triangle inequality. Different norms measure 'length' differently. Then → now: you used 'magnitude' or 'Euclidean length.' Now the plural matters — L1, L2, L∞ are distinct tools, and which one a system uses is a real engineering decision (Ch. 25 shows TurboQuant supports L2 but not L1, for a reason rooted in §1.3). answers “how long is this vector?” — and the surprise for returning engineers is that there’s more than one defensible answer. The dot product already handed us one: a vector’s length is the square root of its dot product with itself.

‖a‖₂ = √⟨a, a⟩ = √(Σᵢ aᵢ²) # the L2 (Euclidean) norm

That’s the L2 norm — ordinary straight-line length, the one your geometry intuition already holds. But it’s a member of a family, the L_p norms:

‖a‖ₚ = ( Σᵢ |aᵢ|^p )^(1/p) # p=1 : L1 = Σ|aᵢ| taxicab / Manhattan distance — sum of absolute coords # p=2 : L2 = √Σaᵢ² Euclidean — straight line # p=∞ : L∞ = maxᵢ|aᵢ| Chebyshev — the single largest coordinate

Distance is the norm of a difference. The Euclidean distance between two points is ‖a − b‖₂, and here’s the identity that ties this whole chapter together — distance, expanded, is built entirely from dot products and norms:

‖a − b‖² = ⟨a−b, a−b⟩ = ‖a‖² + ‖b‖² − 2⟨a,b⟩

Stare at that for a second, because it’s not just algebra — it’s an engineering result. If you’ve precomputed and stored ‖a‖² and ‖b‖², then Euclidean distance costs you exactly one dot product plus two lookups. This is precisely the trick that lets TurboQuant capstone ref The rotation-based quantizer from our earlier conversation. It stores each vector's norm so that L2 and dot distance both reduce to one dot product against the compressed code. Ch. 25. You'll see the formula ‖q−v‖² = ‖q‖²+‖v‖²−2‖v‖‖q‖⟨q̂,v̂⟩ in Qdrant's writeup — it's this identity, with the norms factored out and stored. You now have the math under it. store one scalar norm per vector and reconstruct full L2 distance from compressed codes.

Why TurboQuant supports L2 and dot but not L1: rotation preserves ‖·‖₂ exactly (it’s built from the rotation-invariant dot product) but mangles ‖·‖₁. The norm you pick decides whether you’re allowed to rotate — a §1.3 fact with Ch. 25 consequences.

The norm, vectorized

An L2 norm is just a dot product of a vector with itself, then a square root — so we already have the hard part. The squared norm reuses the FMA loop verbatim. Here is the AVX version straight from the compiled source tree:

sqnorm_avx (from dot_avx.c) AVX2 + FMA · x86-64

    __m256 acc = _mm256_setzero_ps();
    int i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 va = _mm256_loadu_ps(a + i);
        acc = _mm256_fmadd_ps(va, va, acc);    /* square each lane */
    }
    __m128 lo = _mm256_castps256_ps128(acc);
    __m128 hi = _mm256_extractf128_ps(acc, 1);
    lo = _mm_add_ps(lo, hi);
    lo = _mm_hadd_ps(lo, lo);
    lo = _mm_hadd_ps(lo, lo);
    float s = _mm_cvtss_f32(lo);
    for (; i < n; i++) s += a[i] * a[i];
    return s;                                  /* caller does sqrtf() iff needed */
}

And the NEON twin running natively on this Apple Silicon machine — half the lane width, but vaddvq_f32 collapses 4 lanes to 1 scalar in a single instruction instead of the x86 dance:

sqnorm_neon (from dot_neon.c) NEON · arm64

float sqnorm_neon(const float* a, int n) {
    float32x4_t acc = vdupq_n_f32(0.0f);
    int i = 0;
    for (; i + 4 <= n; i += 4) {
        float32x4_t va = vld1q_f32(a + i);
        acc = vfmaq_f32(acc, va, va);
    }
    float s = vaddvq_f32(acc);
    for (; i < n; i++) s += a[i] * a[i];
    return s;
}

The “skip the sqrt” note is a real production technique, not a micro-optimization curiosity. Nearest-neighbor search compares distances; it never needs their absolute values. Since √ is monotonic, ‖x‖² < ‖y‖² iff ‖x‖ < ‖y‖ — so HNSW and friends rank candidates on squared distance and pay for the square root zero times. Recognizing when an expensive monotonic step can be dropped is exactly the systems instinct this book wants to connect to the math.

— think, then check —

Expand: ‖q−v‖² = ‖q‖² + ‖v‖² − 2⟨q,v⟩. The term ‖q‖² is the same for every v, so it doesn’t affect ranking — drop it. ‖v‖² is precomputed and stored. So per database vector you compute exactly one dot product ⟨q,v⟩ (your dot_avx kernel), then rank on ‖v‖² − 2⟨q,v⟩. No square roots, no recomputed norms. That single fused-multiply-add loop, run across millions of stored vectors, is the hot path of vector search — and when those v are compressed, it becomes the TurboQuant scoring kernel from our conversation. Everything in Ch. 25 is this recall answer, optimized.

↳ draws on §1.1 (rotation), §1.2 (dot = similarity), §1.3 (distance identity)