14. Multi-Tier Indexing
Multi-tier indexing separates data by access frequency and query requirements. Hot data lives in fast storage; cold data on slower, cheaper tiers. This is essential for scaling beyond single-machine memory.
Tier Architecture
┌─────────────────────────────────────────────────────────┐
│ Tier 0: Hot Index (HNSW in RAM) │
│ - Top 1-10% of most queried vectors │
│ - Sub-millisecond query latency │
│ - High memory cost │
├─────────────────────────────────────────────────────────┤
│ Tier 1: Warm Index (HNSW on NVMe) │
│ - Next 10-30% of vectors │
│ - 1-10ms query latency │
│ - Lower cost, larger capacity │
├─────────────────────────────────────────────────────────┤
│ Tier 2: Cold Storage (PQ + IVF on object storage) │
│ - Remaining vectors │
│ - 10-100ms query latency │
│ - Commodity storage costs │
└─────────────────────────────────────────────────────────┘
Routing Strategy
Queries cascade through tiers until recall requirements are met:
from dataclasses import dataclass
from typing import Protocol
import numpy as np
class VectorIndex(Protocol):
def search(self, query: np.ndarray, k: int) -> tuple[np.ndarray, np.ndarray]: ...
@dataclass
class TierConfig:
name: str
index: VectorIndex
size: int # number of vectors in this tier
recall_weight: float # fraction of total recall this tier contributes
class MultiTierSearch:
def __init__(self, tiers: list[TierConfig]):
self.tiers = sorted(tiers, key=lambda t: t.size, reverse=True) # smallest first
self.total_vectors = sum(t.size for t in tiers)
self.cumulative_sizes = np.cumsum([t.size for t in tiers])
def search(self, query: np.ndarray, k: int,
target_recall: float = 0.95) -> tuple[np.ndarray, np.ndarray]:
"""
Search through tiers until target recall is achieved.
Returns approximate top-k results.
"""
results = {} # index -> (distance, original_id)
seen_ids = set()
accumulated = 0
for tier in self.tiers:
tier_results, tier_ids = tier.index.search(query, k + len(seen_ids))
# Merge new results
for dist, idx in zip(tier_results, tier_ids):
global_idx = accumulated + idx
if global_idx not in seen_ids:
results[global_idx] = (dist, global_idx)
seen_ids.add(global_idx)
accumulated += tier.size
# Check if we have enough results
if len(results) >= k:
sorted_results = sorted(results.values(), key=lambda x: x[0])
top_k = sorted_results[:k]
return np.array([r[0] for r in top_k]), np.array([r[1] for r in top_k])
# Fallback: return what we have
sorted_results = sorted(results.values(), key=lambda x: x[0])
return np.array([r[0] for r in sorted_results[:k]]), np.array([r[1] for r in sorted_results[:k]])
Tier Promotion
Hot vectors get promoted to faster tiers based on query frequency:
from collections import deque
import threading
class TierManager:
def __init__(self, hot_threshold: int = 100, promotion_window: int = 3600):
self.query_counts = {} # vector_id -> query count in window
self.window_seconds = promotion_window
self.hot_threshold = hot_threshold
self.lock = threading.Lock()
self.promotion_queue = deque()
def record_query(self, vector_id: int):
"""Record that a vector was returned in a query result."""
with self.lock:
self.query_counts[vector_id] = self.query_counts.get(vector_id, 0) + 1
def check_promotions(self) -> list[int]:
"""
Return list of vector IDs that should be promoted to hot tier.
Should be called periodically (e.g., every minute).
"""
with self.lock:
candidates = [
vid for vid, count in self.query_counts.items()
if count >= self.hot_threshold
]
self.promotion_queue.extend(candidates)
self.query_counts.clear()
return list(self.promotion_queue)
Failure Modes
Tier boundary recall loss: Vectors near tier boundaries may have different distances in each tier's index. Always re-rank unified results against the original vectors when precision matters.
Promotion lag: Newly hot vectors wait for the promotion cycle. If query patterns change rapidly (e.g., trending content), consider synchronous promotion with a latency budget.
Implement a 2-tier system with tier 0 containing 10% of vectors and tier 1 containing 90%. Measure end-to-end recall at different k values (1, 10, 100) comparing single-tier (tier 1 only) vs multi-tier search. Quantify the recall cost of tiering.
# Expected: recall cost increases with smaller k
# Reason: top results are more likely in hot tier, but misses are more costly