RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Production Local AI Deployment
  6. /Ch. 9
Production Local AI Deployment

09. Kubernetes Deployments

Chapter 9 of 24 · 20 min
KEY INSIGHT

Deployments abstract ReplicaSet management, providing controlled updates and rollback capabilities that maintain service availability during infrastructure changes.

Deployments manage pod lifecycle through ReplicaSets, providing declarative updates, rollback capabilities, and versioning. The deployment controller maintains desired state by creating, updating, or deleting ReplicaSets as specifications change.

Rolling updates replace pods incrementally, ensuring availability throughout the update process. The maxSurge and maxUnavailable parameters control the update pace. maxSurge>0 allows temporary overcapacity during updates; maxUnavailable=0 keeps all old pods running until new pods pass readiness checks.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inference-server
  namespace: ai-inference
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 0
  selector:
    matchLabels:
      app: inference-server
  template:
    metadata:
      labels:
        app: inference-server
        version: v1.4.2
    spec:
      containers:
        - name: inference
          image: inference/model-server:v1.4.2
          ports:
            - containerPort: 8080
          resources:
            requests:
              memory: "4Gi"
              cpu: "1000m"
            limits:
              memory: "8Gi"
              cpu: "2000m"
              nvidia.com/gpu: 1
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 60
            periodSeconds: 10
          env:
            - name: MODEL_VERSION
              value: "v1.4.2"
            - name: INFERENCE_BATCH_SIZE
              value: "32"

Rollback capabilities restore previous specifications. The revisionHistoryLimit controls how many old ReplicaSets Kubernetes retains for potential rollback. The history command views available revisions; rollback applies the specified or previous revision.

# View deployment history
kubectl rollout history deployment inference-server

# Rollback to previous revision
kubectl rollout undo deployment inference-server

# Rollback to specific revision
kubectl rollout undo deployment inference-server \
  --to-revision=2

# Watch rollout progress
kubectl rollout status deployment inference-server

Pod templates hash changes trigger new ReplicaSet creation. Annotations on the deployment track the current and previous template specifications. The revision numbers increment with successful or failed rollout attempts.

deployments scale independently from updates using the scale subcommand or by modifying replica specifications. Autoscaling components like HorizontalPodAutoscaler override manual scaling when configured.

EXERCISE

Perform a complete deployment lifecycle for an inference service. Create the initial deployment, trigger a rolling update by changing the image version, observe the rollout progress, identify an issue requiring rollback, and execute the rollback. Document the rollout status output at each stage.

# Initial deployment
kubectl apply -f deployment.yaml
kubectl get pods -l app=inference-server --watch

# Trigger update
kubectl set image deployment/inference-server \
  inference=inference/model-server:v1.5.0

# Observe rollout
kubectl rollout status deployment/inference-server

# Simulate rollback scenario
kubectl rollout history deployment/inference-server
kubectl rollout undo deployment/inference-server

# Verify rollback
kubectl describe deployment inference-server \
  | grep -A5 "Annotations:"
← Chapter 8
GPU Node Selection
Chapter 10 →
Services and Ingress