HOW-TO · OPS

How to set up network policies for AI service mesh isolation

advanced25 minBy Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Kubernetes with CNI supporting network policies

What this does

This guide configures Kubernetes Network Policies to isolate AI services into a service mesh with explicit ingress/egress rules. Inference pods communicate only with the API gateway and model storage; agent pods can reach tool APIs and vector databases; and no pod accepts traffic from unauthorized sources. This zero-trust networking model prevents lateral movement if an AI service is compromised and protects proprietary model weights from exfiltration.

Steps

  1. Verify network policy support is enabled in the cluster:

    kubectl api-resources | grep networkpolicies
    

    Expected output: networkpolicies in the list, confirming the API resource exists.

  2. Apply namespace labels for policy targeting:

    kubectl label namespace ai-inference purpose=inference
    kubectl label namespace ai-agents purpose=agents
    kubectl label namespace ai-data purpose=data
    
  3. Create a deny-all default policy for each namespace to establish a zero-trust baseline:

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: deny-all
      namespace: ai-inference
    spec:
      podSelector: {}
      policyTypes:
        - Ingress
        - Egress
    

    Expected: after applying, no pods in the namespace can receive or send traffic.

  4. Allow ingress from the API gateway to inference pods:

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-from-gateway
      namespace: ai-inference
    spec:
      podSelector:
        matchLabels:
          app: vllm
      ingress:
        - from:
            - namespaceSelector:
                matchLabels:
                  purpose: gateway
          ports:
            - port: 8000
              protocol: TCP
      policyTypes: [Ingress]
    
  5. Allow egress from inference pods to model storage (S3 or PVC backend):

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-to-storage
      namespace: ai-inference
    spec:
      podSelector:
        matchLabels:
          app: vllm
      egress:
        - to:
            - namespaceSelector:
                matchLabels:
                  purpose: data
          ports:
            - port: 443
              protocol: TCP
      policyTypes: [Egress]
    
  6. Allow agent pods to reach external tool APIs. Use an IP block for external services:

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-to-tools
      namespace: ai-agents
    spec:
      podSelector:
        matchLabels:
          app: ai-agent
      egress:
        - to:
            - ipBlock:
                cidr: 10.0.0.0/8
                except: [10.0.0.0/28]  # Exclude sensitive subnets
          ports:
            - port: 443
              protocol: TCP
        - to:
            - podSelector:
                matchLabels:
                  app: weaviate
          ports:
            - port: 8080
              protocol: TCP
      policyTypes: [Egress]
    
  7. Allow DNS resolution egress from all namespaces (necessary for service discovery):

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-dns
      namespace: ai-inference
    spec:
      podSelector: {}
      egress:
        - to:
            - namespaceSelector: {}
              podSelector:
                matchLabels:
                  k8s-app: kube-dns
          ports:
            - port: 53
              protocol: UDP
      policyTypes: [Egress]
    
  8. Test isolation by attempting to reach a restricted service from an unauthorized pod:

    kubectl run -it --rm test-pod --image=alpine --namespace=default -- sh
    wget -qO- http://vllm.ai-inference.svc.cluster.local:8000/health
    

    Expected: connection timeout, confirming the deny-all policy blocks cross-namespace traffic.

Verification

kubectl get networkpolicy -A --no-headers | wc -l

Expected output: the total count of active Network Policies across all namespaces (should be >= 5 with the defined policies).

Common failures

  • Service-to-service communication broken — a deny-all policy was applied without corresponding allow rules. Audit with kubectl describe networkpolicy for each namespace and ensure every legitimate communication path has an explicit allow rule.
  • DNS resolution fails — the deny-all policy blocks UDP port 53 to CoreDNS. Add the allow-dns policy described in step 7 to every namespace with a deny-all baseline.
  • Network policy has no effect — the CNI may not support Kubernetes Network Policies. Check with kubectl get pods -n kube-system | grep -E "calico|cilium|antrea" to confirm a compatible CNI is running.
  • Ingress allowed but response traffic blocked — egress rules are required for the return path. Ensure both the request and response directions have matching allow rules.

Related guides