Vector Database Internals

09. HNSW Search

Chapter 9 of 18 · 25 min

KEY INSIGHT

HNSW search is a guided greedy walk. At each layer, you examine local neighbors and move to the one closest to your query. This greedy approach works because the graph is constructed to be navigable—neighbors are arranged so greedy walks reach good solutions. ### Search Algorithm ```python def hnsw_search(hnsw, query: np.ndarray, ef: int = 10, k: int = 10) -> List[int]: """ Search HNSW for k nearest neighbors. Args: hnsw: HNSW index query: query vector ef: search width (larger = more thorough = higher recall) k: number of neighbors to return Returns: List of k nearest node IDs """ if hnsw.entry_point is None: return [] # Get top layer top_layer = max(hnsw.entry_point.neighbors.keys()) if hnsw.entry_point.neighbors else 0 # Phase 1: Descend from top to layer 1 current = hnsw.entry_point for layer in range(top_layer, 0, -1): improved = True while improved: improved = False current_dist = np.linalg.norm(current.vector - query) for neighbor_id in current.neighbors.get(layer, []): neighbor = hnsw.nodes[neighbor_id] dist = np.linalg.norm(neighbor.vector - query) if dist < current_dist: current = neighbor improved = True break # Phase 2: Search layer 0 with ef parameter candidates = [(np.linalg.norm(current.vector - query), current)] visited = {current.id} results = [] # (distance, node) while candidates: dist, node = heapq.heappop(candidates) heapq.heappush(results, (-dist, node)) # Max heap via negative # If we have enough results and this candidate is worse, stop if len(results) >= ef and dist > -results[0][0]: break for neighbor_id in node.neighbors.get(0, []): if neighbor_id in visited: continue visited.add(neighbor_id) neighbor = hnsw.nodes[neighbor_id] dist = np.linalg.norm(neighbor.vector - query) # Add if promising if len(results) < ef or dist < -results[0][0]: heapq.heappush(candidates, (dist, neighbor)) heapq.heappush(results, (-dist, neighbor)) # Extract top k results.sort(reverse=True) return [node.id for _, node in results[:k]] ``` ### The ef Parameter ef controls the search width at the bottom layer. Higher ef = more candidates examined = higher recall = slower search. The relationship: - ef=1: Pure greedy, very fast, poor recall - ef=10-50: Good balance for most use cases - ef=100+: Near-perfect recall, but approaching brute force speed ```python def benchmark_ef(hnsw, queries, ground_truth): """Measure recall vs ef parameter.""" results = [] for ef in [1, 5, 10, 25, 50, 100, 200]: total_recall = 0 for query, gt in zip(queries, ground_truth): retrieved = hnsw_search(hnsw, query, ef=ef, k=10) recall = len(set(retrieved) & set(gt)) / 10 total_recall += recall avg_recall = total_recall / len(queries) results.append((ef, avg_recall)) print(f"ef={ef:3d}: recall@10 = {avg_recall:.4f}") return results # Run benchmark recall_by_ef = benchmark_ef(hnsw, queries[:100], gt) ``` ### Why ef Works With ef > 1, you maintain a pool of candidates rather than just tracking the best. This lets you explore alternatives that might lead to better solutions. The key insight: the best solution at layer 0 might not be reachable from the best node at layer 1. By examining more candidates, you find paths that pure greedy would miss. ### Trade-offs | ef | Recall | Latency | Memory | |----|--------|---------|--------| | 1 | ~0.7 | Fastest | Low | | 10 | ~0.95 | Baseline | Baseline | | 100 | ~0.99 | 5-10x baseline | Baseline | | 1000 | ~1.0 | Approaches brute force | Baseline | Memory is constant because HNSW stores the graph structure; ef only affects search-time computation.

HNSW search starts at the top layer and greedily traverses edges toward the query, descending layer by layer until reaching layer 0.

EXERCISE

Build an HNSW index and measure recall vs latency for different ef values. Plot the curve and identify the "knee" where increasing ef gives diminishing recall returns. This is typically where you'd set ef for production.

import time

def plot_recall_latency(hnsw, queries, gt):
    """Plot recall vs latency curve."""
    latencies = []
    recalls = []
    
    for ef in [1, 2, 5, 10, 20, 50, 100, 200]:
        # Measure latency
        times = []
        for query in queries:
            start = time.time()
            hnsw_search(hnsw, query, ef=ef, k=10)
            times.append((time.time() - start) * 1000)
        avg_latency = np.mean(times)
        
        # Measure recall
        total_recall = 0
        for query, gt_result in zip(queries, gt):
            retrieved = hnsw_search(hnsw, query, ef=ef, k=10)
            total_recall += len(set(retrieved) & set(gt_result)) / 10
        avg_recall = total_recall / len(queries)
        
        latencies.append(avg_latency)
        recalls.append(avg_recall)
        print(f"ef={ef:3d}: {avg_latency:6.2f}ms, recall={avg_recall:.4f}")
    
    plt.figure(figsize=(10, 6))
    plt.plot(latencies, recalls, 'o-')
    plt.xlabel('Latency (ms)')
    plt.ylabel('Recall@10')
    plt.title('HNSW Recall vs Latency Trade-off')
    plt.grid(True)
    plt.show()

plot_recall_latency(hnsw, queries[:200], gt[:200])

This completes Chapters 1-9 of Vector Database Internals. You now understand why vector databases exist, how brute force works, how IVF partitions and searches vectors, and how HNSW builds a navigable graph structure. In the following chapters, we'll cover HNSW parameters, quantization techniques, and production deployment considerations.