RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Vector Database Internals
  6. /Ch. 7
Vector Database Internals

07. HNSW: Hierarchical Navigable Small World

Chapter 7 of 18 · 20 min
KEY INSIGHT

HNSW is essentially a highway system for vectors. The top layer is sparse highways that get you close to your destination fast. The bottom layer is dense local roads that find the exact nearest neighbors. ### The Multi-Layer Graph HNSW builds a layered graph where: - Layer 0 contains all vectors (dense) - Layer 1 contains ~50% of vectors (sparse) - Layer 2 contains ~25% of vectors (sparsest) - And so on... Search starts at the top layer, finds the closest node, and descends to the next layer, repeating until layer 0. ```python import numpy as np from dataclasses import dataclass from typing import List, Dict, Set @dataclass class Node: id: int vector: np.ndarray neighbors: Dict[int, Set[int]] # layer -> set of neighbor node IDs def __hash__(self): return hash(self.id) class HNSW: """ Simplified HNSW implementation for educational purposes. Production systems use optimized structures and additional pruning. """ def __init__(self, m: int = 16, max_layers: int = None): self.m = m # Number of connections per node per layer self.max_layers = max_layers self.nodes: Dict[int, Node] = {} self.entry_point: Node = None def _get_layers(self, num_vectors: int) -> List[int]: """ Determine layer for each node. Layer selection follows geometric distribution: P(layer > L) = exp(-L / lambda) Lambda controls average number of layers. """ if self.max_layers is None: # Auto-calculate based on size self.max_layers = int(np.log2(num_vectors)) + 1 # Geometric distribution for layer assignment lamb = 1.0 / np.log(2) # Average ~1 layer per node layers = [] for _ in range(num_vectors): l = 0 while l < self.max_layers and np.random.random() < 1.0 / np.exp(l / lamb): l += 1 layers.append(l - 1) if l > 0 else layers.append(0) return layers def add(self, vectors: np.ndarray): """Add vectors to HNSW index.""" if len(self.nodes) == 0: # First node becomes entry point first_node = Node(id=0, vector=vectors[0], neighbors={}) for layer in range(self.max_layers or 1): first_node.neighbors[layer] = set() self.nodes[0] = first_node self.entry_point = first_node vectors = vectors[1:] node_layers = self._get_layers(len(vectors)) for i, vec in enumerate(vectors): node_id = len(self.nodes) node = Node(id=node_id, vector=vec, neighbors={}) # Determine max layer for this node max_layer = node_layers[i] for layer in range(max_layer + 1): node.neighbors[layer] = set() # Search for insertion point at this layer entry = self.entry_point for l in range(len(self.nodes), -1, -1): if l > max_layer: continue best = self._search_layer(entry, vec, layer, ef=1)[0] entry = best # Find neighbors and connect neighbors = self._search_layer(entry, vec, layer, ef=self.m) for neighbor in neighbors[:self.m]: node.neighbors[layer].add(neighbor.id) neighbor.neighbors[layer].add(node.id) self.nodes[node_id] = node def _search_layer(self, entry: Node, query: np.ndarray, layer: int, ef: int = 1) -> List[Node]: """ Search within a single layer. ef: search width (number of candidates to keep) """ visited = {entry.id} candidates = [(0, entry)] # (distance, node) best = entry while candidates: # Get furthest element _, current = heapq.heappop(candidates) # Get node at max layer if current.neighbors.get(layer): for neighbor_id in current.neighbors[layer]: if neighbor_id in visited: continue visited.add(neighbor_id) neighbor = self.nodes[neighbor_id] dist = np.linalg.norm(neighbor.vector - query) if dist < np.linalg.norm(best.vector - query): best = neighbor # Add to candidates if promising heapq.heappush(candidates, (dist, neighbor)) return best ``` ### Why HNSW Works HNSW exploits the small world phenomenon: in high-dimensional spaces, local neighborhood and long-range connections coexist. By organizing vectors into a navigable small world graph, you can traverse from anywhere to anywhere in logarithmic time. The top layers provide "long-range" connections that skip large portions of the space. The bottom layers provide local precision.

HNSW is a graph-based ANN method that builds a multi-layer structure where upper layers enable fast skipping to the right region, and lower layers refine the search.

EXERCISE

Build a 2D visualization of an HNSW graph. Plot nodes, color by layer, draw edges. Observe how upper layers have sparser connectivity and longer edges.

import matplotlib.pyplot as plt
import numpy as np

def visualize_hnsw(hnsw):
    """Visualize HNSW structure in 2D."""
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    for ax, layer in zip(axes, [0, 1, 2]):
        positions = []
        for node in hnsw.nodes.values():
            if layer in node.neighbors:
                pos = node.vector[:2]  # First 2 dimensions
                positions.append(pos)
                for neighbor_id in node.neighbors[layer]:
                    neighbor = hnsw.nodes[neighbor_id]
                    ax.plot([pos[0], neighbor.vector[0]], 
                           [pos[1], neighbor.vector[1]], 
                           alpha=0.1, color='gray')
        
        positions = np.array(positions)
        ax.scatter(positions[:, 0], positions[:, 1], s=10)
        ax.set_title(f'Layer {layer}')
    
    plt.show()

# Generate test data
vectors = np.random.randn(500, 128).astype('float32')
hnsw = HNSW(m=8)
hnsw.add(vectors)
visualize_hnsw(hnsw)
← Chapter 6
IVF Parameters
Chapter 8 →
HNSW Construction