What this does

This guide configures Pod Disruption Budgets (PDBs) for AI inference and agent services to prevent cascading failures during voluntary disruptions — node drains, cluster autoscaler scale-down, and rolling upgrades. A PDB specifies the minimum number of available pods that must remain running during any disruption event. For AI workloads, this is critical because model loading takes minutes, and losing all replicas simultaneously causes extended downtime.

Steps

Check the current replica count and topology spread:

kubectl get deployment ai-inference -o json | jq '{replicas: .spec.replicas, spread: .spec.template.spec.topologySpreadConstraints}'

Expected output: the replica count and any topology spread constraints in effect.

Decide on the PDB strategy. For stateless inference (replicas are interchangeable), use maxUnavailable. For stateful agent services with conversation affinity, use minAvailable:
- maxUnavailable: 1 — allows at most 1 pod to be disrupted at a time (good for 2-3 replica setups)
- minAvailable: 1 — ensures at least 1 pod is always running (good for 2 replica setups with critical availability)
- For 5+ replicas, minAvailable: 50% or maxUnavailable: 25%

Create the PDB manifest ai-inference-pdb.yaml:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: ai-inference-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: vllm

For a larger deployment with 5+ replicas, use minAvailable with a percentage:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: ai-agent-pdb
spec:
  minAvailable: 60%
  selector:
    matchLabels:
      app: ai-agent

Apply and verify the PDB:
```
kubectl apply -f ai-inference-pdb.yaml
kubectl get pdb ai-inference-pdb
```
Expected output: ALLOWED DISRUPTIONS column showing allowed disruptions (e.g., 1 for a 2-replica deployment with maxUnavailable: 1).
Test the PDB by simulating a node drain. First, record the impact:
```
kubectl get pods -l app=vllm -o wide
kubectl drain <node-name> --dry-run=client
```
Expected: the drain command reports it can evict only the allowed number of pods.
For rolling upgrades, configure the Deployment strategy to respect the PDB. The default RollingUpdate strategy already respects PDBs:
```
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
```
During a model upgrade, update the Deployment image and monitor the PDB:
```
kubectl set image deployment/ai-inference vllm=vllm/vllm-openai:v0.5.0 --record
kubectl get pdb ai-inference-pdb -w
```
Expected: the number of healthy replicas never drops below the minAvailable threshold or exceeds the maxUnavailable limit.

Verification

kubectl get pdb ai-inference-pdb -o json | jq '{maxUnavailable: .spec.maxUnavailable, allowedDisruptions: .status.disruptionsAllowed, currentHealthy: .status.currentHealthy}'

Expected output: JSON showing the PDB configuration and current status, with currentHealthy >= minAvailable.

Common failures

PDB blocks all voluntary disruptions — if minAvailable equals the total replica count, zero disruptions are allowed, including node drains and cluster autoscaler. Set minAvailable to total replicas minus 1: for a 3-replica deployment, use minAvailable: 2.
PDB not enforced during rolling updates — the Deployment controller manages rolling updates separately from the Eviction API. Set Deployment maxUnavailable equal to or less than the PDB's maxUnavailable for consistency.
Disruptions allowed exceeds PDB — the maxUnavailable counts pods that are already unhealthy (not ready). If a pod fails health checks independently, it counts toward the disruption budget but is not a planned disruption.

How to implement pod disruption budgets for AI services during upgrades

What this does

Steps

Verification

Common failures

Related guides