RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Enterprise-Scale RAG
  6. /Ch. 6
Enterprise-Scale RAG

06. Real-Time Indexing

Chapter 6 of 24 · 15 min
KEY INSIGHT

Real-time indexing is achievable but requires explicit architecture. You need synchronous write confirmation, metadata-index synchronization, and index refresh monitoring. Without these, you'll ship a system where "documents uploaded today don't appear until tomorrow."

Real-time indexing means new documents become searchable within seconds of upload. This requirement contradicts the batch-oriented nature of vector indexing—most vector databases rebuild or update indexes in ways that take minutes.

The indexing latency stack includes: document processing time, embedding generation time, vector database write time, and index refresh time. Each stage adds latency. A document uploaded at T=0 might not appear in search results until T=30-60 seconds.

HNSW indexes (used by most vector databases) support incremental updates poorly. Adding vectors requires graph rewiring that can temporarily degrade search quality. You face a tradeoff: update immediately (faster indexing, temporarily degraded search) or batch updates (better search consistency, delayed indexing).

# Incremental vs batched indexing tradeoffs
async def index_document_incremental(chunk: Chunk) -> None:
    """Updates immediately but may cause search degradation"""
    vector = await embedding_service.embed(chunk.text)
    await vector_db.insert(
        id=chunk.id,
        vector=vector,
        metadata=chunk.metadata
    )
    # Graph rewiring happens here—concurrent searches may see inconsistent results

async def index_document_batched(chunk: Chunk) -> None:
    """Accumulates in memory, indexes in batches"""
    pending_chunks.append(chunk)
    if len(pending_chunks) >= BATCH_SIZE:
        await _flush_batch()
        await vector_db.refresh_index()

Index refresh strategies vary by database. Qdrant supports point updates without full index rebuild. Weaviate implements eventual consistency with configurable replication. Milvus requires periodic index compaction.

# Milvus index compaction scheduling
curl -X PUT "http://milvus:9091/indexmanager/compact" \
  -H "Content-Type: application/json" \
  -d '{"schedule_interval": 3600, "threshold": 10000}'

Consistency guarantees matter for compliance. If a legal document is uploaded and immediately searched by an attorney, the system must return the document. This requires synchronous indexing—block the upload response until the vector is searchable.

The failure mode pattern: indexing appears to succeed but search returns nothing. This happens when vector insert succeeds but metadata index (document ID → chunk IDs mapping) fails silently. The vector exists but is unreachable through normal queries.

EXERCISE

Design a monitoring dashboard for indexing latency. What metrics would you track to detect when indexing delays exceed your SLA? Include both infrastructure metrics and business-level indicators.

← Chapter 5
Document Ingestion Pipeline
Chapter 7 →
Batch vs Streaming Ingestion