Vector search & ANN

§1 Exact kNN → HNSW small-world graphs
Vector search is the fundamental operation behind RAG, semantic search, recommendation systems, and embedding-based LLM augmentation. Exact kNN over N embeddings is O(N·d) per query — too slow at scale. Approximate methods trade some recall for orders-of-magnitude speedup. HNSW (Malkov 2018) is the dominant graph-based approach: a hierarchical small-world graph navigable via greedy descent. 10-100× speedup over brute force at 90%+ recall.
§2 Product quantization + binary codes
HNSW solves "how to traverse" but doesn‘t reduce per-vector cost. Product Quantization (PQ, Jegou 2011) compresses each vector into a 8-32 byte code, achieving 10-50× memory reduction. Binary embeddings push further: 1 bit per dimension, Hamming-distance search. Combined with HNSW (or IVF — coarse clustering), these form FAISS‘s IVF+PQ recipe — the industry-standard billion-scale ANN system.
§3 RaBitQ and TurboQuant — rotation-based quantization
PQ requires training codebooks per subvector — adds infrastructure cost. Binary quantization is fast but loses recall. RaBitQ (Gao 2024) showed that randomly ROTATING vectors before binary quantization gives provable recall guarantees: the rotated binary code preserves enough geometric structure to recover near-fp32 search quality. TurboQuant extends this with learned rotations. This is the seam in current research — and where the book began.