Multi-Tier Indexing — Vector Database Internals (Chapter 14)

Multi-tier indexing separates data by access frequency and query requirements. Hot data lives in fast storage; cold data on slower, cheaper tiers. This is essential for scaling beyond single-machine memory.

Tier Architecture

┌─────────────────────────────────────────────────────────┐
│  Tier 0: Hot Index (HNSW in RAM)                       │
│  - Top 1-10% of most queried vectors                   │
│  - Sub-millisecond query latency                       │
│  - High memory cost                                    │
├─────────────────────────────────────────────────────────┤
│  Tier 1: Warm Index (HNSW on NVMe)                     │
│  - Next 10-30% of vectors                              │
│  - 1-10ms query latency                                │
│  - Lower cost, larger capacity                         │
├─────────────────────────────────────────────────────────┤
│  Tier 2: Cold Storage (PQ + IVF on object storage)     │
│  - Remaining vectors                                   │
│  - 10-100ms query latency                              │
│  - Commodity storage costs                             │
└─────────────────────────────────────────────────────────┘

Routing Strategy

Queries cascade through tiers until recall requirements are met:

from dataclasses import dataclass
from typing import Protocol
import numpy as np

class VectorIndex(Protocol):
    def search(self, query: np.ndarray, k: int) -> tuple[np.ndarray, np.ndarray]: ...

@dataclass
class TierConfig:
    name: str
    index: VectorIndex
    size: int  # number of vectors in this tier
    recall_weight: float  # fraction of total recall this tier contributes

class MultiTierSearch:
    def __init__(self, tiers: list[TierConfig]):
        self.tiers = sorted(tiers, key=lambda t: t.size, reverse=True)  # smallest first
        self.total_vectors = sum(t.size for t in tiers)
        self.cumulative_sizes = np.cumsum([t.size for t in tiers])
    
    def search(self, query: np.ndarray, k: int, 
               target_recall: float = 0.95) -> tuple[np.ndarray, np.ndarray]:
        """
        Search through tiers until target recall is achieved.
        Returns approximate top-k results.
        """
        results = {}  # index -> (distance, original_id)
        seen_ids = set()
        accumulated = 0
        
        for tier in self.tiers:
            tier_results, tier_ids = tier.index.search(query, k + len(seen_ids))
            
            # Merge new results
            for dist, idx in zip(tier_results, tier_ids):
                global_idx = accumulated + idx
                if global_idx not in seen_ids:
                    results[global_idx] = (dist, global_idx)
                    seen_ids.add(global_idx)
            
            accumulated += tier.size
            
            # Check if we have enough results
            if len(results) >= k:
                sorted_results = sorted(results.values(), key=lambda x: x[0])
                top_k = sorted_results[:k]
                return np.array([r[0] for r in top_k]), np.array([r[1] for r in top_k])
        
        # Fallback: return what we have
        sorted_results = sorted(results.values(), key=lambda x: x[0])
        return np.array([r[0] for r in sorted_results[:k]]), np.array([r[1] for r in sorted_results[:k]])

Tier Promotion

Hot vectors get promoted to faster tiers based on query frequency:

from collections import deque
import threading

class TierManager:
    def __init__(self, hot_threshold: int = 100, promotion_window: int = 3600):
        self.query_counts = {}  # vector_id -> query count in window
        self.window_seconds = promotion_window
        self.hot_threshold = hot_threshold
        self.lock = threading.Lock()
        self.promotion_queue = deque()
    
    def record_query(self, vector_id: int):
        """Record that a vector was returned in a query result."""
        with self.lock:
            self.query_counts[vector_id] = self.query_counts.get(vector_id, 0) + 1
    
    def check_promotions(self) -> list[int]:
        """
        Return list of vector IDs that should be promoted to hot tier.
        Should be called periodically (e.g., every minute).
        """
        with self.lock:
            candidates = [
                vid for vid, count in self.query_counts.items()
                if count >= self.hot_threshold
            ]
            
            self.promotion_queue.extend(candidates)
            self.query_counts.clear()
        
        return list(self.promotion_queue)

Failure Modes

Tier boundary recall loss: Vectors near tier boundaries may have different distances in each tier's index. Always re-rank unified results against the original vectors when precision matters.

Promotion lag: Newly hot vectors wait for the promotion cycle. If query patterns change rapidly (e.g., trending content), consider synchronous promotion with a latency budget.