Services and Ingress — Production Local AI Deployment (Chapter 10)

Kubernetes Services provide stable network endpoints for transient pods. The service abstraction decouples consumers from pod IP addresses that change during rescheduling, scaling, and rolling updates. Service types determine network exposure scope.

ClusterIP services receive internal-only IP addresses within the cluster. Internal communication patterns use ClusterIP services as the stable target for deployments that never require external access.

apiVersion: v1
kind: Service
metadata:
  name: inference-service
  namespace: ai-inference
spec:
  type: ClusterIP
  selector:
    app: inference-server
  ports:
    - name: http
      port: 80
      targetPort: 8080
      protocol: TCP
    - name: metrics
      port: 9090
      targetPort: 9090
      protocol: TCP

NodePort services expose ports on every node's IP address. External traffic reaches the service through nodeIP:nodePort. NodePort ranges default to 30000-32767. The pattern suits development and edge deployments without load balancer infrastructure.

LoadBalancer services integrate with cloud provider control planes to provision external load balancers. On-premises deployments require MetalLB or similar bare-metal load balancer implementations to provide LoadBalancer functionality.

Ingress resources manage HTTP/HTTPS routing at the application layer. Ingress controllers implement the routing rules and terminate TLS connections. The ingress pattern enables host-based and path-based routing to multiple backend services.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: inference-ingress
  namespace: ai-inference
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - inference.example.com
      secretName: inference-tls-cert
  rules:
    - host: inference.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: inference-service
                port:
                  number: 80

Headless services with clusterIP: None return pod IPs directly through DNS. Consumer applications performing their own load balancing use headless services for direct pod discovery. The pattern supports inference clients implementing custom connection pooling.

EndpointSlice objects track pod IPs behind services at scale. Controller-managed EndpointSlices group endpoints for efficient watch operations. Services referencing 100+ pods automatically create multiple EndpointSlices.

# Create ClusterIP service kubectl expose deployment inference-server \ --name=inference-svc \ --type=ClusterIP \ --port=80 \ --target-port=8080 # Create NodePort for development access kubectl expose deployment inference-server \ --name=inference-svc-dev \ --type=NodePort \ --port=80 \ --target-port=8080 \ --node-port=30080 # Create Ingress with TLS kubectl create tls inference-tls \ --cert=inference.crt \ --key=inference.key \ --namespace=ai-inference kubectl apply -f ingress.yaml # Verify all networking layers kubectl get svc,ingress kubectl describe ingress inference-ingress curl -k https://localhost/health