09. Kubernetes Deployments
Deployments manage pod lifecycle through ReplicaSets, providing declarative updates, rollback capabilities, and versioning. The deployment controller maintains desired state by creating, updating, or deleting ReplicaSets as specifications change.
Rolling updates replace pods incrementally, ensuring availability throughout the update process. The maxSurge and maxUnavailable parameters control the update pace. maxSurge>0 allows temporary overcapacity during updates; maxUnavailable=0 keeps all old pods running until new pods pass readiness checks.
apiVersion: apps/v1
kind: Deployment
metadata:
name: inference-server
namespace: ai-inference
spec:
replicas: 6
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2
maxUnavailable: 0
selector:
matchLabels:
app: inference-server
template:
metadata:
labels:
app: inference-server
version: v1.4.2
spec:
containers:
- name: inference
image: inference/model-server:v1.4.2
ports:
- containerPort: 8080
resources:
requests:
memory: "4Gi"
cpu: "1000m"
limits:
memory: "8Gi"
cpu: "2000m"
nvidia.com/gpu: 1
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
env:
- name: MODEL_VERSION
value: "v1.4.2"
- name: INFERENCE_BATCH_SIZE
value: "32"
Rollback capabilities restore previous specifications. The revisionHistoryLimit controls how many old ReplicaSets Kubernetes retains for potential rollback. The history command views available revisions; rollback applies the specified or previous revision.
# View deployment history
kubectl rollout history deployment inference-server
# Rollback to previous revision
kubectl rollout undo deployment inference-server
# Rollback to specific revision
kubectl rollout undo deployment inference-server \
--to-revision=2
# Watch rollout progress
kubectl rollout status deployment inference-server
Pod templates hash changes trigger new ReplicaSet creation. Annotations on the deployment track the current and previous template specifications. The revision numbers increment with successful or failed rollout attempts.
deployments scale independently from updates using the scale subcommand or by modifying replica specifications. Autoscaling components like HorizontalPodAutoscaler override manual scaling when configured.
Perform a complete deployment lifecycle for an inference service. Create the initial deployment, trigger a rolling update by changing the image version, observe the rollout progress, identify an issue requiring rollback, and execute the rollback. Document the rollout status output at each stage.
# Initial deployment
kubectl apply -f deployment.yaml
kubectl get pods -l app=inference-server --watch
# Trigger update
kubectl set image deployment/inference-server \
inference=inference/model-server:v1.5.0
# Observe rollout
kubectl rollout status deployment/inference-server
# Simulate rollback scenario
kubectl rollout history deployment/inference-server
kubectl rollout undo deployment/inference-server
# Verify rollback
kubectl describe deployment inference-server \
| grep -A5 "Annotations:"