09. HNSW Search
Chapter 9 of 18 · 25 min
HNSW search starts at the top layer and greedily traverses edges toward the query, descending layer by layer until reaching layer 0.
EXERCISE
Build an HNSW index and measure recall vs latency for different ef values. Plot the curve and identify the "knee" where increasing ef gives diminishing recall returns. This is typically where you'd set ef for production.
import time
def plot_recall_latency(hnsw, queries, gt):
"""Plot recall vs latency curve."""
latencies = []
recalls = []
for ef in [1, 2, 5, 10, 20, 50, 100, 200]:
# Measure latency
times = []
for query in queries:
start = time.time()
hnsw_search(hnsw, query, ef=ef, k=10)
times.append((time.time() - start) * 1000)
avg_latency = np.mean(times)
# Measure recall
total_recall = 0
for query, gt_result in zip(queries, gt):
retrieved = hnsw_search(hnsw, query, ef=ef, k=10)
total_recall += len(set(retrieved) & set(gt_result)) / 10
avg_recall = total_recall / len(queries)
latencies.append(avg_latency)
recalls.append(avg_recall)
print(f"ef={ef:3d}: {avg_latency:6.2f}ms, recall={avg_recall:.4f}")
plt.figure(figsize=(10, 6))
plt.plot(latencies, recalls, 'o-')
plt.xlabel('Latency (ms)')
plt.ylabel('Recall@10')
plt.title('HNSW Recall vs Latency Trade-off')
plt.grid(True)
plt.show()
plot_recall_latency(hnsw, queries[:200], gt[:200])
This completes Chapters 1-9 of Vector Database Internals. You now understand why vector databases exist, how brute force works, how IVF partitions and searches vectors, and how HNSW builds a navigable graph structure. In the following chapters, we'll cover HNSW parameters, quantization techniques, and production deployment considerations.