Data Isolation — Security and Privacy for Local AI (Chapter 6)

Data isolation prevents cross-tenant or cross-project data leakage. In local AI deployments, this matters for: multi-user systems where users should not access each other's conversations, RAG systems where retrieved documents must respect access controls, and shared infrastructure where one team's data must not influence another team's model behavior.

Isolation levels:

Process isolation runs each AI service in an isolated process with its own memory space. Docker containers, separate user accounts, or process namespaces provide this. A compromise of one process cannot directly read another process's memory.

Filesystem isolation separates data directories so services can only read files they own. Use separate volumes or directories with restrictive permissions:

# Create isolated data directories
mkdir -p /srv/ai/tenant-a/data /srv/ai/tenant-b/data
chown ai-service-a:ai-service-a /srv/ai/tenant-a
chown ai-service-b:ai-service-b /srv/ai/tenant-b
chmod 700 /srv/ai/tenant-a /srv/ai/tenant-b

Network isolation separates services on different network segments. Use VLANs or firewall rules to prevent services from reaching each other:

# iptables rules for service isolation
# Allow only necessary communication between services
iptables -A FORWARD -s 10.0.1.0/24 -d 10.0.2.0/24 -m state \
    --state NEW -m limit --limit 10/minute -j ACCEPT
iptables -A FORWARD -j DROP

Namespace isolation uses Linux namespaces to isolate PID, network, mount, and user spaces:

# Create isolated namespace for AI service
unshare --mount --pid --fork --net --user bash
# Process inside this namespace cannot see host processes or network

RAG-specific isolation:

Retrieval-augmented generation introduces document isolation challenges. A vector database may index documents from multiple tenants, and queries must return only documents the user is authorized to see.

# Document-level access control for vector retrieval
def retrieve_with_access_control(
    query: str,
    user_id: str,
    user_clearance: str,
    vector_db: VectorStore
) -> list[Document]:
    # Fetch all candidate documents
    candidates = vector_db.similarity_search(query, k=20)
    
    # Filter by access control
    authorized = []
    for doc in candidates:
        doc_level = doc.metadata.get("access_level", "public")
        if ACCESS_HIERARCHY[doc_level] <= ACCESS_HIERARCHY[user_clearance]:
            authorized.append(doc)
    
    return authorized[:5]