02. Vector Search Fundamentals
Chapter 2 of 18 · 20 min
Before diving into indexes, you need to understand what you're actually searching and how distances are measured. The choice of vector representation and distance metric affects everything downstream.
EXERCISE
Generate vectors in 2D, 16D, 64D, and 256D. For each dimensionality, compute the ratio between the 10th nearest neighbor distance and the median distance across all points. Watch how this ratio shrinks as dimensionality increases—demonstrating why search becomes harder.
import numpy as np
def nearest_ratio(dim, n_points=1000):
vectors = np.random.rand(n_points, dim)
# Use brute force
dists = np.linalg.norm(vectors[:, np.newaxis] - vectors[np.newaxis, :], axis=2)
np.fill_diagonal(dists, np.inf)
sorted_dists = np.sort(dists, axis=1)
median_dist = np.median(sorted_dists[:, 0])
tenth_dist = np.median(sorted_dists[:, 9])
return tenth_dist / median_dist
for dim in [2, 16, 64, 256]:
ratio = nearest_ratio(dim)
print(f"Dim {dim:3d}: 10th/1st ratio = {ratio:.4f}")