How to implement pod disruption budgets for AI services during upgrades
Kubernetes cluster, AI deployments running
What this does
This guide configures Pod Disruption Budgets (PDBs) for AI inference and agent services to prevent cascading failures during voluntary disruptions — node drains, cluster autoscaler scale-down, and rolling upgrades. A PDB specifies the minimum number of available pods that must remain running during any disruption event. For AI workloads, this is critical because model loading takes minutes, and losing all replicas simultaneously causes extended downtime.
Steps
Check the current replica count and topology spread:
kubectl get deployment ai-inference -o json | jq '{replicas: .spec.replicas, spread: .spec.template.spec.topologySpreadConstraints}'Expected output: the replica count and any topology spread constraints in effect.
Decide on the PDB strategy. For stateless inference (replicas are interchangeable), use
maxUnavailable. For stateful agent services with conversation affinity, useminAvailable:maxUnavailable: 1— allows at most 1 pod to be disrupted at a time (good for 2-3 replica setups)minAvailable: 1— ensures at least 1 pod is always running (good for 2 replica setups with critical availability)- For 5+ replicas,
minAvailable: 50%ormaxUnavailable: 25%
Create the PDB manifest
ai-inference-pdb.yaml:apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: ai-inference-pdb spec: maxUnavailable: 1 selector: matchLabels: app: vllmFor a larger deployment with 5+ replicas, use
minAvailablewith a percentage:apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: ai-agent-pdb spec: minAvailable: 60% selector: matchLabels: app: ai-agentApply and verify the PDB:
kubectl apply -f ai-inference-pdb.yaml kubectl get pdb ai-inference-pdbExpected output:
ALLOWED DISRUPTIONScolumn showing allowed disruptions (e.g., 1 for a 2-replica deployment withmaxUnavailable: 1).Test the PDB by simulating a node drain. First, record the impact:
kubectl get pods -l app=vllm -o wide kubectl drain <node-name> --dry-run=clientExpected: the drain command reports it can evict only the allowed number of pods.
For rolling upgrades, configure the Deployment strategy to respect the PDB. The default RollingUpdate strategy already respects PDBs:
spec: strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 maxSurge: 1During a model upgrade, update the Deployment image and monitor the PDB:
kubectl set image deployment/ai-inference vllm=vllm/vllm-openai:v0.5.0 --record kubectl get pdb ai-inference-pdb -wExpected: the number of healthy replicas never drops below the
minAvailablethreshold or exceeds themaxUnavailablelimit.
Verification
kubectl get pdb ai-inference-pdb -o json | jq '{maxUnavailable: .spec.maxUnavailable, allowedDisruptions: .status.disruptionsAllowed, currentHealthy: .status.currentHealthy}'
Expected output: JSON showing the PDB configuration and current status, with currentHealthy >= minAvailable.
Common failures
- PDB blocks all voluntary disruptions — if
minAvailableequals the total replica count, zero disruptions are allowed, including node drains and cluster autoscaler. SetminAvailableto total replicas minus 1: for a 3-replica deployment, useminAvailable: 2. - PDB not enforced during rolling updates — the Deployment controller manages rolling updates separately from the Eviction API. Set Deployment
maxUnavailableequal to or less than the PDB'smaxUnavailablefor consistency. - Disruptions allowed exceeds PDB — the
maxUnavailablecounts pods that are already unhealthy (not ready). If a pod fails health checks independently, it counts toward the disruption budget but is not a planned disruption.