How to set up network policies for AI service mesh isolation
Kubernetes with CNI supporting network policies
What this does
This guide configures Kubernetes Network Policies to isolate AI services into a service mesh with explicit ingress/egress rules. Inference pods communicate only with the API gateway and model storage; agent pods can reach tool APIs and vector databases; and no pod accepts traffic from unauthorized sources. This zero-trust networking model prevents lateral movement if an AI service is compromised and protects proprietary model weights from exfiltration.
Steps
Verify network policy support is enabled in the cluster:
kubectl api-resources | grep networkpoliciesExpected output:
networkpoliciesin the list, confirming the API resource exists.Apply namespace labels for policy targeting:
kubectl label namespace ai-inference purpose=inference kubectl label namespace ai-agents purpose=agents kubectl label namespace ai-data purpose=dataCreate a deny-all default policy for each namespace to establish a zero-trust baseline:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: deny-all namespace: ai-inference spec: podSelector: {} policyTypes: - Ingress - EgressExpected: after applying, no pods in the namespace can receive or send traffic.
Allow ingress from the API gateway to inference pods:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-gateway namespace: ai-inference spec: podSelector: matchLabels: app: vllm ingress: - from: - namespaceSelector: matchLabels: purpose: gateway ports: - port: 8000 protocol: TCP policyTypes: [Ingress]Allow egress from inference pods to model storage (S3 or PVC backend):
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-to-storage namespace: ai-inference spec: podSelector: matchLabels: app: vllm egress: - to: - namespaceSelector: matchLabels: purpose: data ports: - port: 443 protocol: TCP policyTypes: [Egress]Allow agent pods to reach external tool APIs. Use an IP block for external services:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-to-tools namespace: ai-agents spec: podSelector: matchLabels: app: ai-agent egress: - to: - ipBlock: cidr: 10.0.0.0/8 except: [10.0.0.0/28] # Exclude sensitive subnets ports: - port: 443 protocol: TCP - to: - podSelector: matchLabels: app: weaviate ports: - port: 8080 protocol: TCP policyTypes: [Egress]Allow DNS resolution egress from all namespaces (necessary for service discovery):
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-dns namespace: ai-inference spec: podSelector: {} egress: - to: - namespaceSelector: {} podSelector: matchLabels: k8s-app: kube-dns ports: - port: 53 protocol: UDP policyTypes: [Egress]Test isolation by attempting to reach a restricted service from an unauthorized pod:
kubectl run -it --rm test-pod --image=alpine --namespace=default -- sh wget -qO- http://vllm.ai-inference.svc.cluster.local:8000/healthExpected: connection timeout, confirming the deny-all policy blocks cross-namespace traffic.
Verification
kubectl get networkpolicy -A --no-headers | wc -l
Expected output: the total count of active Network Policies across all namespaces (should be >= 5 with the defined policies).
Common failures
- Service-to-service communication broken — a deny-all policy was applied without corresponding allow rules. Audit with
kubectl describe networkpolicyfor each namespace and ensure every legitimate communication path has an explicit allow rule. - DNS resolution fails — the deny-all policy blocks UDP port 53 to CoreDNS. Add the
allow-dnspolicy described in step 7 to every namespace with a deny-all baseline. - Network policy has no effect — the CNI may not support Kubernetes Network Policies. Check with
kubectl get pods -n kube-system | grep -E "calico|cilium|antrea"to confirm a compatible CNI is running. - Ingress allowed but response traffic blocked — egress rules are required for the return path. Ensure both the request and response directions have matching allow rules.