15. Geographic Distribution
Chapter 15 of 24 · 15 min
Enterprise RAG systems serve global users with latency requirements under 100ms. Geographic distribution replicates vector indices and caches across regions to minimize round-trip times.
The architecture uses a primary-secondary replication model:
from redis.replication import ReplicaOf
class GeoDistributedCache:
def __init__(self):
self.primary = Redis(host='us-east-1.primary.internal', port=6379)
self.replicas = {
"eu-west-1": Redis(host='eu-west-1.replica.internal', port=6379),
"ap-southeast-1": Redis(host='ap-southeast-1.replica.internal', port=6379),
}
def read_from_nearest(self, query: str) -> str | None:
# In production, use geo-routing via geographic IP lookup
regional_endpoint = self._resolve_endpoint()
replica = self.replicas.get(regional_endpoint, self.primary)
return replica.get(query)
def _resolve_endpoint(self) -> str:
# Simplified: in production use headers or client-side routing
import os
return os.environ.get("REGION", "us-east-1")
Vector index distribution requires more sophisticated handling because Redis replication doesn't work for vector similarity search indices. Use a multi-index approach:
class DistributedVectorIndex:
def __init__(self, regions: list[dict]):
self.regions = regions
self.index_map = {r["name"]: self._create_region_index(r)
for r in regions}
def _create_region_index(self, region: dict):
r = Redis(host=region["host"], port=region["port"])
# Create RediSearch index with TTL for eventual consistency
r.ft().create_index([
VectorField("embedding", "FLOAT64", dimension=384),
TextField("chunk_id"),
TagField("region_tags")
])
return r
def write_to_primary(self, chunk_id: str, embedding: np.ndarray):
self.primary.execute_command(
"FT.VOLATILE.UPDATE idx:chunks",
chunk_id, embedding.tobytes()
)
# Background sync to replicas
def search_local(self, region: str, query_emb: np.ndarray, k: int = 10):
local_index = self.index_map[region]
results = local_index.ft().search(
f"*=>[KNN {k} @embedding $vec]",
query_params={"vec": query_emb.tobytes()}
)
return results
Failure Modes:
- Replication lag: Writes in US-East may not reflect in EU-West for 50-500ms. Users might get stale results and immediately after see fresher results in the same session.
- Split-brain on write failures: Regional replica unavailable for writes. Implement fallback routing to primary with elevated latency.
- Vector drift: Embedding models updated across regions at different times, causing semantic search inconsistency. Pin model versions in container images.
DNS-based regional routing (api.eu-west-1.rag.internal) combined with health checks provides automatic failover.
EXERCISE
Configure Redis replica configuration for two regions. Measure replication lag by writing a timestamped entry at the primary and checking when it appears at the replica.