01. Why Build a Vector DB?

Chapter 1 of 18 · 15 min

KEY INSIGHT

Vector databases exist because exact nearest neighbor search in high-dimensional space is computationally intractable at scale, and approximate methods trade recall for speed in ways that matter enormously in production. The core problem: given 10 million embeddings (from images, text, audio, or any deep learning model), find the k closest vectors to a query. A naive approach examines every vector—O(n) time. At 10M vectors with 768-dimensional float32 vectors, you're doing 7.68 billion float comparisons per query. That's approximately never acceptable. Approximate Nearest Neighbor (ANN) indexes solve this by accepting that you don't need the *exact* answer, just a *good enough* answer that's fast. The trade-off is parameterized—you control how much accuracy you sacrifice for speed. Three core techniques power modern vector databases: **Inverted File Index (IVF)** partitions your vector space into clusters. At query time, you find which cluster your query lands in and only search that cluster (plus a few neighbors). The parameter `nprobe` controls how many clusters you check—higher nprobe = higher recall = slower queries. **Hierarchical Navigable Small World (HNSW)** builds a multi-layer graph structure. Upper layers are sparse and let you jump toward your target region quickly. Lower layers are dense and refine the search. Think of it as a highway system for vectors. **Product Quantization (PQ)** compresses vectors by splitting them into subvectors and clustering each subvector independently. This reduces memory footprint by 10-100x and allows GPU-accelerated distance computation on compressed data.

You're probably here because someone told you to use a vector database, and you want to understand what actually happens when you call that add or search method. Smart move. The gap between "it works" and "I understand why it's slow" is where production incidents live.

EXERCISE

Install FAISS or usearch and index 100k random vectors. Run a search and note the latency. Then index 1M vectors and note how latency changes. You won't understand why yet—but you'll have a baseline.

# Minimal FAISS index creation
import faiss
import numpy as np

# Create 100k 128-dim vectors (pretend these are embeddings)
vectors = np.random.rand(100000, 128).astype('float32')

# Build a brute-force index to establish baseline
index = faiss.IndexFlatL2(vectors.shape[1])
index.add(vectors)

# Query
query = np.random.rand(1, 128).astype('float32')
distances, indices = index.search(query, k=10)
print(f"Top 10 indices: {indices[0]}")
print(f"Top 10 distances: {distances[0]}")