Geographic Distribution — Enterprise-Scale RAG (Chapter 15)

Enterprise RAG systems serve global users with latency requirements under 100ms. Geographic distribution replicates vector indices and caches across regions to minimize round-trip times.

The architecture uses a primary-secondary replication model:

from redis.replication import ReplicaOf

class GeoDistributedCache:
    def __init__(self):
        self.primary = Redis(host='us-east-1.primary.internal', port=6379)
        self.replicas = {
            "eu-west-1": Redis(host='eu-west-1.replica.internal', port=6379),
            "ap-southeast-1": Redis(host='ap-southeast-1.replica.internal', port=6379),
        }
    
    def read_from_nearest(self, query: str) -> str | None:
        # In production, use geo-routing via geographic IP lookup
        regional_endpoint = self._resolve_endpoint()
        replica = self.replicas.get(regional_endpoint, self.primary)
        return replica.get(query)
    
    def _resolve_endpoint(self) -> str:
        # Simplified: in production use headers or client-side routing
        import os
        return os.environ.get("REGION", "us-east-1")

Vector index distribution requires more sophisticated handling because Redis replication doesn't work for vector similarity search indices. Use a multi-index approach:

class DistributedVectorIndex:
    def __init__(self, regions: list[dict]):
        self.regions = regions
        self.index_map = {r["name"]: self._create_region_index(r) 
                         for r in regions}
    
    def _create_region_index(self, region: dict):
        r = Redis(host=region["host"], port=region["port"])
        # Create RediSearch index with TTL for eventual consistency
        r.ft().create_index([
            VectorField("embedding", "FLOAT64", dimension=384),
            TextField("chunk_id"),
            TagField("region_tags")
        ])
        return r
    
    def write_to_primary(self, chunk_id: str, embedding: np.ndarray):
        self.primary.execute_command(
            "FT.VOLATILE.UPDATE idx:chunks",
            chunk_id, embedding.tobytes()
        )
        # Background sync to replicas
    
    def search_local(self, region: str, query_emb: np.ndarray, k: int = 10):
        local_index = self.index_map[region]
        results = local_index.ft().search(
            f"*=>[KNN {k} @embedding $vec]",
            query_params={"vec": query_emb.tobytes()}
        )
        return results

Failure Modes:

Replication lag: Writes in US-East may not reflect in EU-West for 50-500ms. Users might get stale results and immediately after see fresher results in the same session.
Split-brain on write failures: Regional replica unavailable for writes. Implement fallback routing to primary with elevated latency.
Vector drift: Embedding models updated across regions at different times, causing semantic search inconsistency. Pin model versions in container images.

DNS-based regional routing (api.eu-west-1.rag.internal) combined with health checks provides automatic failover.