What this does

This guide configures Docker healthcheck probes and restart policies for AI inference containers. Healthchecks detect when a model server is truly ready (model loaded, API responding) versus merely running (process alive). Restart policies ensure automatic recovery from crashes, GPU OOM events, and hung inference loops. Together they minimize downtime for AI services in both development and production.

Steps

Identify the health endpoint for the AI service. For vLLM: GET /health returns 200 when the model is loaded. For Ollama: GET /api/tags returns a list of available models. For a custom FastAPI service: a /health/ready endpoint.

Add a healthcheck to the Docker Compose service definition:

services:
  vllm:
    image: vllm/vllm-openai:latest
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"]
      interval: 15s
      timeout: 5s
      retries: 3
      start_period: 120s

The start_period of 120 seconds accounts for model loading time. Adjust based on model size: 60s for 1B models, 300s for 70B models.

For containers without curl, add a custom healthcheck script in the Dockerfile:

FROM ollama/ollama:latest
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
HEALTHCHECK --interval=10s --timeout=5s --start-period=60s --retries=3 \
  CMD curl -f http://localhost:11434/api/tags || exit 1

Configure the restart policy. Use unless-stopped for production services:
```
services:
  vllm:
    restart: unless-stopped
```
For development, use on-failure:3 to limit restart attempts and avoid infinite loops:
```
    restart: on-failure:3
```
Combine healthcheck with restart to handle hung services. Docker restarts containers with a non-zero exit code, but a hung container still shows as "running." The healthcheck detects the hang by a failing health check, but Docker does not auto-restart on health check failure alone. Add an external watcher:
```
watcher:
  image: alpine
  restart: unless-stopped
  entrypoint: |
    /bin/sh -c 'while true; do
      if ! curl -sf http://vllm:8000/health; then
        echo "Health check failed, triggering restart";
      fi;
      sleep 30;
    done'
```
View container health status:
```
docker compose ps --format "table {{.Name}}\t{{.Status}}"
```
Expected output: a table showing (healthy) next to each AI service.
Simulate a model reload and observe the healthcheck transition:
```
docker compose exec vllm kill -STOP 1   # hangs the process
docker compose ps
```
After 45 seconds (3 retries * 15s interval), the status changes to (unhealthy).

Verification

docker inspect $(docker compose ps -q vllm) | jq '.[0].State.Health.Status'

Expected output: "healthy".

Common failures

Container never becomes healthy — the start_period is shorter than the model loading time. Increase it or check docker compose logs vllm for "Uvicorn running" messages indicating the server is ready for health checks.
Healthcheck succeeds but service is not serving requests — the health endpoint returns 200 during model loading. Use a /health/ready endpoint that validates the model is actually loaded and ready for inference.
Restart policy triggers infinite loop — restart: always combined with an immediate crash will cycle indefinitely. Use on-failure:5 or add a deploy.resources.limits.memory constraint.
curl not found in container — add to Dockerfile: RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*.

How to implement healthchecks and restart policies for AI containers

What this does

Steps

Verification

Common failures

Related guides