Kubernetes Deployments — Production Local AI Deployment (Chapter 9)

Deployments manage pod lifecycle through ReplicaSets, providing declarative updates, rollback capabilities, and versioning. The deployment controller maintains desired state by creating, updating, or deleting ReplicaSets as specifications change.

Rolling updates replace pods incrementally, ensuring availability throughout the update process. The maxSurge and maxUnavailable parameters control the update pace. maxSurge>0 allows temporary overcapacity during updates; maxUnavailable=0 keeps all old pods running until new pods pass readiness checks.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inference-server
  namespace: ai-inference
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 0
  selector:
    matchLabels:
      app: inference-server
  template:
    metadata:
      labels:
        app: inference-server
        version: v1.4.2
    spec:
      containers:
        - name: inference
          image: inference/model-server:v1.4.2
          ports:
            - containerPort: 8080
          resources:
            requests:
              memory: "4Gi"
              cpu: "1000m"
            limits:
              memory: "8Gi"
              cpu: "2000m"
              nvidia.com/gpu: 1
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 60
            periodSeconds: 10
          env:
            - name: MODEL_VERSION
              value: "v1.4.2"
            - name: INFERENCE_BATCH_SIZE
              value: "32"

Rollback capabilities restore previous specifications. The revisionHistoryLimit controls how many old ReplicaSets Kubernetes retains for potential rollback. The history command views available revisions; rollback applies the specified or previous revision.

# View deployment history
kubectl rollout history deployment inference-server

# Rollback to previous revision
kubectl rollout undo deployment inference-server

# Rollback to specific revision
kubectl rollout undo deployment inference-server \
  --to-revision=2

# Watch rollout progress
kubectl rollout status deployment inference-server

Pod templates hash changes trigger new ReplicaSet creation. Annotations on the deployment track the current and previous template specifications. The revision numbers increment with successful or failed rollout attempts.

deployments scale independently from updates using the scale subcommand or by modifying replica specifications. Autoscaling components like HorizontalPodAutoscaler override manual scaling when configured.

# Initial deployment kubectl apply -f deployment.yaml kubectl get pods -l app=inference-server --watch # Trigger update kubectl set image deployment/inference-server \ inference=inference/model-server:v1.5.0 # Observe rollout kubectl rollout status deployment/inference-server # Simulate rollback scenario kubectl rollout history deployment/inference-server kubectl rollout undo deployment/inference-server # Verify rollback kubectl describe deployment inference-server \ | grep -A5 "Annotations:"