10. Services and Ingress
Kubernetes Services provide stable network endpoints for transient pods. The service abstraction decouples consumers from pod IP addresses that change during rescheduling, scaling, and rolling updates. Service types determine network exposure scope.
ClusterIP services receive internal-only IP addresses within the cluster. Internal communication patterns use ClusterIP services as the stable target for deployments that never require external access.
apiVersion: v1
kind: Service
metadata:
name: inference-service
namespace: ai-inference
spec:
type: ClusterIP
selector:
app: inference-server
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
- name: metrics
port: 9090
targetPort: 9090
protocol: TCP
NodePort services expose ports on every node's IP address. External traffic reaches the service through nodeIP:nodePort. NodePort ranges default to 30000-32767. The pattern suits development and edge deployments without load balancer infrastructure.
LoadBalancer services integrate with cloud provider control planes to provision external load balancers. On-premises deployments require MetalLB or similar bare-metal load balancer implementations to provide LoadBalancer functionality.
Ingress resources manage HTTP/HTTPS routing at the application layer. Ingress controllers implement the routing rules and terminate TLS connections. The ingress pattern enables host-based and path-based routing to multiple backend services.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: inference-ingress
namespace: ai-inference
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
spec:
ingressClassName: nginx
tls:
- hosts:
- inference.example.com
secretName: inference-tls-cert
rules:
- host: inference.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: inference-service
port:
number: 80
Headless services with clusterIP: None return pod IPs directly through DNS. Consumer applications performing their own load balancing use headless services for direct pod discovery. The pattern supports inference clients implementing custom connection pooling.
EndpointSlice objects track pod IPs behind services at scale. Controller-managed EndpointSlices group endpoints for efficient watch operations. Services referencing 100+ pods automatically create multiple EndpointSlices.
Configure service networking for an AI inference stack with multiple components. Create a ClusterIP service for the inference deployment, a NodePort service for external development access, and an Ingress resource with TLS termination routing to the internal service. Verify connectivity through each layer.
# Create ClusterIP service
kubectl expose deployment inference-server \
--name=inference-svc \
--type=ClusterIP \
--port=80 \
--target-port=8080
# Create NodePort for development access
kubectl expose deployment inference-server \
--name=inference-svc-dev \
--type=NodePort \
--port=80 \
--target-port=8080 \
--node-port=30080
# Create Ingress with TLS
kubectl create tls inference-tls \
--cert=inference.crt \
--key=inference.key \
--namespace=ai-inference
kubectl apply -f ingress.yaml
# Verify all networking layers
kubectl get svc,ingress
kubectl describe ingress inference-ingress
curl -k https://localhost/health