The curse of dimensionality

Section 6.1

The curse of dimensionality

Two-dimensional and three-dimensional geometric intuition is a liar. It tells you that a ball “fills” the cube around it; that “near” and “far” are well-defined; that randomly placed points scatter usefully across a region. None of this remains true above 20 dimensions, and the failure modes intensify as you climb. Your nearest-neighbour search degenerates, your clustering algorithm finds nothing, your kernel density estimate underflows. The curse of dimensionality is the umbrella term for this collapse — and it’s the structural reason every vector-search system you’ll touch (HNSW, IVF-PQ, RaBitQ, TurboQuant — Ch.25) is an approximate nearest-neighbour algorithm, not an exact one. This section names two faces of the curse and runs them in code so you can feel the magnitudes.

The corners eat the volume

The first face: in a d-dimensional unit hypercube [0, 1]^d, almost all the volume is in the corners. The inscribed ball — the largest ball that fits inside — shrinks to a vanishing fraction of the cube as d grows.

Quantitatively, the volume of the unit-radius ball in d dimensions is:

V_d = π^(d/2) / Γ(d/2 + 1) (Γ = the gamma function) V_1 = 2, V_2 = π, V_3 = 4π/3, V_4 ≈ 4.93, V_5 ≈ 5.26 …

For a cube of side 1 the volume is always 1, regardless of dimension. The inscribed ball (radius 1/2) inside that unit cube has volume V_d × (1/2)^d = V_d / 2^d. That ratio:

d        V_d              ratio inside cube
1        2                1
2        π = 3.14         0.785
3        4.19             0.524
5        5.26             0.164
10       2.55             0.00249
20       0.026            2.5 × 10⁻⁸
50       1.7 × 10⁻¹³      1.5 × 10⁻²⁸
100      2.4 × 10⁻⁴⁰      1.9 × 10⁻⁷⁰

By d = 100, the ball is a 10⁻⁷⁰ fraction of the cube. There’s no useful sense in which the ball “fills” or even noticeably “is inside” the cube. The corners — and a high-dimensional cube has 2^d of them — have grabbed all the volume.

This also kills naive uniform sampling of a ball in high dimensions: rejection sampling from a cube is 2^d-times wasteful by d = 50. The standard fix is to sample a Gaussian and normalise (Ch.6 §3) — which uses concentration of measure (§6.2) to land you exactly where you’d expect.

Distances concentrate

The second face — and the one that breaks nearest-neighbour search — is that all pairwise distances become indistinguishable. Take N random points uniformly in [0, 1]^d. Compute the maximum and minimum pairwise distance. As d grows, the ratio

(max − min) / mean → 0

The kernel runs this experiment with N = 200 points across a range of dimensions:

d      min dist     max dist     mean dist    (max−min)/mean
2      0.0017       1.299        0.527        2.460
5      0.097        1.675        0.893        1.768
10     0.391        2.184        1.263        1.420
20     0.645        2.694        1.783        1.149
50     1.907        3.824        2.876        0.667
100    3.179        4.947        4.087        0.433
500    8.018        10.027       9.140        0.220

At d = 500, every pair of random points is between 8.0 and 10.0 units apart — the “farthest” and “nearest” differ by only 25% of the mean distance. By the time you’re at the dimensions ML embeddings actually live in (d = 768 for BERT-base, d = 4096 for many transformer hidden states), the discrimination is practically gone. “The nearest neighbour” stops meaning anything robust.

max dimension shown d ≤ 50

ball-in-cube volume ratio (log₁₀ scale)

at d=50, ratio ≈ 1.54e-28

(max − min)/mean pairwise distance (linear)

at d=50, ratio ≈ 0.424

Left: the inscribed-ball volume relative to the unit hypercube crashes by ~70 orders of magnitude going from d=1 to d=100. The unit ball is a vanishing fraction of the unit cube. Right: the relative spread of pairwise distances between random points shrinks toward zero — "nearest neighbour" and "farthest neighbour" become almost interchangeable.

Two faces of the curse: volume concentrates in the corners (so the bounding region is enormously larger than the central ball) and pairwise distances concentrate too (so every random point sees every other one as roughly the same distance away). Exact nearest-neighbour search above d ~ 30 is hopeless.

— think, then check —

The unit ball has volume V_d = π^(d/2) / Γ(d/2 + 1). The unit cube has volume 1. The inscribed ball inside the cube has radius 1/2, so its volume is V_d / 2^d.

That ratio decays super-exponentially: 1 → π/4 ≈ 0.785 → π/6 ≈ 0.524 → 0.308 → 0.164 → 0.0025 (d=10) → 2.5 × 10⁻⁸ (d=20) → 10⁻⁷⁰ (d=100). The ball-to-cube ratio is essentially zero by d = 30 and astronomically small by d = 100.

Geometrically: in high dimensions, almost ALL of a hypercube’s volume sits in the corners, far from any inscribed ball. The cube has 2^d corners, and they win.

↳ §6.1 volume argument

Why ANN, not NN

Combine the two phenomena:

Random points in a high-D region cluster nowhere in particular; the cube’s corners hold all the volume.
Pairwise distances between random points are almost all the same — the relative spread (max − min)/mean goes to zero.

So given a query q, the question “find the database vector with smallest ‖q − v‖” returns something, but the something is barely closer than the second something, which is barely closer than the third, and so on. Floating-point noise alone can flip the ranking among the top hundred candidates. Exact nearest-neighbour search becomes meaningless at the same time it becomes computationally hard.

The engineering response, since the late 1990s, has been to give up on exact NN above d ≈ 20 and build Approximate Nearest Neighbour systems instead:

HNSW (Hierarchical Navigable Small-World graphs): a multi-layer graph where greedy traversal lands near (but not exactly at) the true NN. Used in Pinecone, Weaviate, Qdrant, Faiss, Elasticsearch’s vector path.
IVF-PQ: partition the space (IVF = inverted file), compress vectors per-partition (PQ = product quantization). Lower recall, far less memory.
Rotation-quantization (RaBitQ, TurboQuant): rotate to equalise variance (Ch.5 §3), quantize to bytes (Ch.3 §3), score in SIMD (Ch.3 §3 again). Modern memory-efficient frontier.

The Ch.25 capstone shows all three running. For now, the takeaway is: you cannot evade the curse by being clever about distance metrics or by buying a bigger computer. The right move is to accept approximate answers and engineer the approximation well — which is what every vector-search system you’ll encounter does.

curse.c (key loops) C · two faces of the curse

    *lo = mn; *hi = mx; *avg = sum / cnt;
}

int main(void) {
    printf("(1) Inscribed-ball / hypercube volume ratio  V_d / 2^d\n");
    printf("%-6s %-18s %-18s\n", "d", "ball volume V_d", "ratio inside cube");
    int dims[] = { 1, 2, 3, 4, 5, 8, 10, 16, 20, 30, 50, 100 };
    for (size_t i = 0; i < sizeof(dims)/sizeof(dims[0]); i++) {
        int d = dims[i];
        double vd = ball_volume(d);
        double ratio = ball_in_unit_cube(d);
        printf("%-6d %-18.6g %-18.6g\n", d, vd, ratio);
    }

    printf("\n(2) Pairwise distance concentration (N=200 random uniform points)\n");
    printf("%-6s %-14s %-14s %-14s %-14s\n",
           "d", "min dist", "max dist", "mean dist", "(max−min)/mean");
    int dims2[] = { 2, 5, 10, 20, 50, 100, 500 };
    for (size_t i = 0; i < sizeof(dims2)/sizeof(dims2[0]); i++) {
        int d = dims2[i];
        double lo, hi, avg;
        pairwise_stats(d, 200, &lo, &hi, &avg);
        printf("%-6d %-14.4f %-14.4f %-14.4f %-14.4f\n",
               d, lo, hi, avg, (hi - lo) / avg);
    }

    printf("\n→ ball/cube ratio → 0  : the corners dominate volume in high-D\n");
    printf("→ (max−min)/mean → 0   : every pair of random points has almost the same distance\n");

— think, then check —

The ratio decreases as roughly C/√d. At d = 2 it’s ~2.5; at d = 100 it’s ~0.4; at d = 500 it’s ~0.2. The maximum and minimum pairwise distances become numerically close to the mean — the spread of distances shrinks relative to the typical distance.

Operational consequence: the “nearest” neighbour to a query is barely closer than the “second nearest,” which is barely closer than the third, and so on. The discrimination that makes nearest-neighbour search useful disappears. Adding any noise to the distance computation (quantization, floating-point error) easily flips the ranking among the top candidates.

This is why exact NN is hopeless above d ≈ 20–30 and every vector-database system uses approximate methods (HNSW, IVF-PQ, ANN graphs, RaBitQ/TurboQuant). They don’t try to find the exact nearest neighbour — they find a near neighbour, which is what’s empirically useful anyway.

↳ §6.1 distance concentration

— think, then check —

Curse-of-dimensionality side: At d = 4096, the (max − min)/mean ratio for pairwise distances is on the order of 1/√d ≈ 0.016. The “nearest” and “second-nearest” neighbours are essentially the same distance away; any floating-point noise in the distance computation can flip the ranking among the top hundreds of candidates. There’s no robust meaning to “the exact nearest neighbour” at this dimensionality. Empirically, top-k ANN search achieves the same recall on retrieval benchmarks as exact NN, often within 1%, because the underlying problem doesn’t have a uniquely-correct top-1.

Bandwidth side: Float32 × 4096 = 16 KB per vector. A million-vector database is 16 GB. Computing exact distances against the whole DB for a query needs 16 GB of memory bandwidth per query — bandwidth-bound on any modern CPU. ANN methods (HNSW + quantization) read 100× fewer bytes per query because they (a) prune the search via graph navigation and (b) score against int8-quantized representations (Ch.3 §3) — 4× the lanes per byte, 4× the bandwidth.

Combined: exact NN is the wrong product because it’s expensive AND the “exact” answer it would return isn’t reliably better than what ANN returns. The right product is ANN with good calibration knobs (recall@k targets, latency budgets). That’s what Qdrant, Pinecone, Weaviate, and the rest sell.

↳ §6.1 + Ch.5 §3

END OF CH.6 §1 — The curse of dimensionality.
Built: CurseOfDim viz (slide d, watch ball/cube ratio and distance concentration); curse.c runs both empirically through d=500. Three recall items: easy (volume math), medium (the (max−min)/mean → 0 phenomenon and the consequence for NN), hard (why exact NN is an antiproduct at frontier-model embedding scales).
Coming next: §6.2 — Concentration of measure. The fact that rescues high-D rather than damning it.