06. Real-Time Indexing
Real-time indexing means new documents become searchable within seconds of upload. This requirement contradicts the batch-oriented nature of vector indexing—most vector databases rebuild or update indexes in ways that take minutes.
The indexing latency stack includes: document processing time, embedding generation time, vector database write time, and index refresh time. Each stage adds latency. A document uploaded at T=0 might not appear in search results until T=30-60 seconds.
HNSW indexes (used by most vector databases) support incremental updates poorly. Adding vectors requires graph rewiring that can temporarily degrade search quality. You face a tradeoff: update immediately (faster indexing, temporarily degraded search) or batch updates (better search consistency, delayed indexing).
# Incremental vs batched indexing tradeoffs
async def index_document_incremental(chunk: Chunk) -> None:
"""Updates immediately but may cause search degradation"""
vector = await embedding_service.embed(chunk.text)
await vector_db.insert(
id=chunk.id,
vector=vector,
metadata=chunk.metadata
)
# Graph rewiring happens here—concurrent searches may see inconsistent results
async def index_document_batched(chunk: Chunk) -> None:
"""Accumulates in memory, indexes in batches"""
pending_chunks.append(chunk)
if len(pending_chunks) >= BATCH_SIZE:
await _flush_batch()
await vector_db.refresh_index()
Index refresh strategies vary by database. Qdrant supports point updates without full index rebuild. Weaviate implements eventual consistency with configurable replication. Milvus requires periodic index compaction.
# Milvus index compaction scheduling
curl -X PUT "http://milvus:9091/indexmanager/compact" \
-H "Content-Type: application/json" \
-d '{"schedule_interval": 3600, "threshold": 10000}'
Consistency guarantees matter for compliance. If a legal document is uploaded and immediately searched by an attorney, the system must return the document. This requires synchronous indexing—block the upload response until the vector is searchable.
The failure mode pattern: indexing appears to succeed but search returns nothing. This happens when vector insert succeeds but metadata index (document ID → chunk IDs mapping) fails silently. The vector exists but is unreachable through normal queries.
Design a monitoring dashboard for indexing latency. What metrics would you track to detect when indexing delays exceed your SLA? Include both infrastructure metrics and business-level indicators.