HNSW Parameters — Vector Database Internals (Chapter 10)

The Hierarchical Navigable Small World (HNSW) algorithm controls its behavior through a handful of parameters that directly determine memory usage, build time, and query latency. Understanding these knobs is essential for tuning a production vector database.

The Core Parameters

M (max connections per layer) controls the density of the graph. Higher M values create denser graphs with better recall but consume more memory and slow insertion. Typical values range from 8 to 64. At M=16, expect roughly 0.5 bytes per dimension in memory overhead. At M=64, that jumps to 2-3x.

efConstruction governs the dynamic list size during index building. This parameter directly affects build time and index quality. Higher values produce better graphs but extend build time quadratically. A value of 200-400 is common for production workloads.

efSearch determines the search exploration during queries. Lower values mean faster queries but potentially missed results. Set this based on your recall requirements. If you need 95%+ recall, you may need efSearch=200 or higher, even if 50 is sufficient for 90%.

layerAlpha (in some implementations) controls the exponential decay factor for layer assignment. The standard formula assigns element i to layer l with probability exp(-l * layerAlpha). The default 1/ln(dim) balances skip-list structure with graph connectivity.

Layer Distribution Mathematics

With default parameters, the expected number of layers for n elements follows ln(n). For 10 million vectors, expect approximately 16 layers. The top layer contains only a handful of elements (roughly n/M), while the bottom layer holds the full dataset.

import random
import math

def calculate_layer_distribution(n_vectors: int, m: int, layer_alpha: float = 1.0) -> dict:
    """
    Predict layer distribution for HNSW parameters.
    Returns expected element count per layer.
    """
    layers = {}
    total_elements = n_vectors
    
    for layer in range(int(math.log(n_vectors)) + 1):
        # Probability an element reaches this layer
        p_layer = math.exp(-layer * layer_alpha)
        expected_count = n_vectors * p_layer
        
        # Top layer has ~1/M elements
        if layer == 0:
            expected_count = max(1, total_elements // m)
        
        if expected_count >= 1:
            layers[layer] = int(expected_count)
    
    return layers

# Example: 10M vectors, M=16
dist = calculate_layer_distribution(10_000_000, m=16)
for layer, count in dist.items():
    print(f"Layer {layer}: ~{count:,} elements")

Memory Estimation

Memory(bytes) ≈ n_vectors × (d × 4 + M × 4 + overhead)
              ≈ n_vectors × (d × 4 + M × 8)   # conservative estimate

For 1M vectors at 768 dimensions with M=32: roughly 4-6 GB.

Failure Modes

Setting M too high causes cache thrashing on large datasets. Setting it too low produces disconnected graph regions, visible as sudden recall drops. Always benchmark recall at your target efSearch value before production deployment.