How to implement healthchecks and restart policies for AI containers
AI services running in Docker
What this does
This guide configures Docker healthcheck probes and restart policies for AI inference containers. Healthchecks detect when a model server is truly ready (model loaded, API responding) versus merely running (process alive). Restart policies ensure automatic recovery from crashes, GPU OOM events, and hung inference loops. Together they minimize downtime for AI services in both development and production.
Steps
Identify the health endpoint for the AI service. For vLLM:
GET /healthreturns 200 when the model is loaded. For Ollama:GET /api/tagsreturns a list of available models. For a custom FastAPI service: a/health/readyendpoint.Add a healthcheck to the Docker Compose service definition:
services: vllm: image: vllm/vllm-openai:latest healthcheck: test: ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"] interval: 15s timeout: 5s retries: 3 start_period: 120sThe
start_periodof 120 seconds accounts for model loading time. Adjust based on model size: 60s for 1B models, 300s for 70B models.For containers without curl, add a custom healthcheck script in the Dockerfile:
FROM ollama/ollama:latest RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/* HEALTHCHECK --interval=10s --timeout=5s --start-period=60s --retries=3 \ CMD curl -f http://localhost:11434/api/tags || exit 1Configure the restart policy. Use
unless-stoppedfor production services:services: vllm: restart: unless-stoppedFor development, use
on-failure:3to limit restart attempts and avoid infinite loops:restart: on-failure:3Combine healthcheck with restart to handle hung services. Docker restarts containers with a non-zero exit code, but a hung container still shows as "running." The healthcheck detects the hang by a failing health check, but Docker does not auto-restart on health check failure alone. Add an external watcher:
watcher: image: alpine restart: unless-stopped entrypoint: | /bin/sh -c 'while true; do if ! curl -sf http://vllm:8000/health; then echo "Health check failed, triggering restart"; fi; sleep 30; done'View container health status:
docker compose ps --format "table {{.Name}}\t{{.Status}}"Expected output: a table showing
(healthy)next to each AI service.Simulate a model reload and observe the healthcheck transition:
docker compose exec vllm kill -STOP 1 # hangs the process docker compose psAfter 45 seconds (3 retries * 15s interval), the status changes to
(unhealthy).
Verification
docker inspect $(docker compose ps -q vllm) | jq '.[0].State.Health.Status'
Expected output: "healthy".
Common failures
- Container never becomes healthy — the
start_periodis shorter than the model loading time. Increase it or checkdocker compose logs vllmfor "Uvicorn running" messages indicating the server is ready for health checks. - Healthcheck succeeds but service is not serving requests — the health endpoint returns 200 during model loading. Use a
/health/readyendpoint that validates the model is actually loaded and ready for inference. - Restart policy triggers infinite loop —
restart: alwayscombined with an immediate crash will cycle indefinitely. Useon-failure:5or add adeploy.resources.limits.memoryconstraint. - curl not found in container — add to Dockerfile:
RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*.