RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to implement healthchecks and restart policies for AI containers
HOW-TO · OPS

How to implement healthchecks and restart policies for AI containers

intermediate·15 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

AI services running in Docker

What this does

This guide configures Docker healthcheck probes and restart policies for AI inference containers. Healthchecks detect when a model server is truly ready (model loaded, API responding) versus merely running (process alive). Restart policies ensure automatic recovery from crashes, GPU OOM events, and hung inference loops. Together they minimize downtime for AI services in both development and production.

Steps

  1. Identify the health endpoint for the AI service. For vLLM: GET /health returns 200 when the model is loaded. For Ollama: GET /api/tags returns a list of available models. For a custom FastAPI service: a /health/ready endpoint.

  2. Add a healthcheck to the Docker Compose service definition:

    services:
      vllm:
        image: vllm/vllm-openai:latest
        healthcheck:
          test: ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"]
          interval: 15s
          timeout: 5s
          retries: 3
          start_period: 120s
    

    The start_period of 120 seconds accounts for model loading time. Adjust based on model size: 60s for 1B models, 300s for 70B models.

  3. For containers without curl, add a custom healthcheck script in the Dockerfile:

    FROM ollama/ollama:latest
    RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
    HEALTHCHECK --interval=10s --timeout=5s --start-period=60s --retries=3 \
      CMD curl -f http://localhost:11434/api/tags || exit 1
    
  4. Configure the restart policy. Use unless-stopped for production services:

    services:
      vllm:
        restart: unless-stopped
    

    For development, use on-failure:3 to limit restart attempts and avoid infinite loops:

        restart: on-failure:3
    
  5. Combine healthcheck with restart to handle hung services. Docker restarts containers with a non-zero exit code, but a hung container still shows as "running." The healthcheck detects the hang by a failing health check, but Docker does not auto-restart on health check failure alone. Add an external watcher:

    watcher:
      image: alpine
      restart: unless-stopped
      entrypoint: |
        /bin/sh -c 'while true; do
          if ! curl -sf http://vllm:8000/health; then
            echo "Health check failed, triggering restart";
          fi;
          sleep 30;
        done'
    
  6. View container health status:

    docker compose ps --format "table {{.Name}}\t{{.Status}}"
    

    Expected output: a table showing (healthy) next to each AI service.

  7. Simulate a model reload and observe the healthcheck transition:

    docker compose exec vllm kill -STOP 1   # hangs the process
    docker compose ps
    

    After 45 seconds (3 retries * 15s interval), the status changes to (unhealthy).

Verification

docker inspect $(docker compose ps -q vllm) | jq '.[0].State.Health.Status'

Expected output: "healthy".

Common failures

  • Container never becomes healthy — the start_period is shorter than the model loading time. Increase it or check docker compose logs vllm for "Uvicorn running" messages indicating the server is ready for health checks.
  • Healthcheck succeeds but service is not serving requests — the health endpoint returns 200 during model loading. Use a /health/ready endpoint that validates the model is actually loaded and ready for inference.
  • Restart policy triggers infinite loop — restart: always combined with an immediate crash will cycle indefinitely. Use on-failure:5 or add a deploy.resources.limits.memory constraint.
  • curl not found in container — add to Dockerfile: RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*.

Related guides

  • Set up health checks for AI agent services with custom readiness probes
  • Configure GPU access in Docker Compose for AI inference
  • Deploy vLLM on Kubernetes with GPU node selection
← All how-to guidesCourses →