HOW-TO · OPS

How to set up health checks for AI agent services with custom readiness probes

intermediate15 minBy Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

AI service running on HTTP endpoint

What this does

This guide implements HTTP health check endpoints for AI agent services that report not only process liveness but also downstream dependency readiness. A liveness probe confirms the server process is running; a readiness probe verifies the agent can reach its model backend, vector database, and tool services before accepting traffic. These probes integrate directly with Docker healthchecks and Kubernetes pod lifecycle management.

Steps

  1. Add a /health/live liveness endpoint that returns HTTP 200 with a minimal response:

    @app.get("/health/live")
    def liveness():
        return {"status": "alive", "timestamp": time.time()}
    
  2. Add a /health/ready readiness endpoint that probes all backend dependencies:

    @app.get("/health/ready")
    def readiness():
        checks = {}
        try:
            resp = requests.get(f"{LLM_BACKEND}/health", timeout=2)
            checks["llm_backend"] = "ok" if resp.status_code == 200 else "degraded"
        except Exception:
            checks["llm_backend"] = "unreachable"
        try:
            db.ping()
            checks["vector_db"] = "ok"
        except Exception:
            checks["vector_db"] = "unreachable"
        all_ok = all(v == "ok" for v in checks.values())
        status_code = 200 if all_ok else 503
        return JSONResponse(content={"status": "ready" if all_ok else "not_ready", "checks": checks}, status_code=status_code)
    
  3. Test the endpoints locally:

    curl -s http://localhost:8000/health/live && echo "" && curl -s http://localhost:8000/health/ready
    

    Expected output: two JSON responses, the second showing dependency status.

  4. Add a Docker healthcheck directive in the Dockerfile or docker-compose.yml:

    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health/live"]
      interval: 15s
      timeout: 5s
      retries: 3
      start_period: 30s
    
  5. For Kubernetes, define liveness and readiness probes in the deployment spec:

    livenessProbe:
      httpGet:
        path: /health/live
        port: 8000
      initialDelaySeconds: 10
      periodSeconds: 15
    readinessProbe:
      httpGet:
        path: /health/ready
        port: 8000
      initialDelaySeconds: 20
      periodSeconds: 10
    
  6. Deploy and watch the pod status:

    kubectl get pods -w
    

    Expected output: pod transitions from ContainerCreating to Running only after the readiness probe succeeds.

  7. Simulate a dependency failure by stopping the model backend, then observe the readiness probe return 503:

    curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/health/ready
    

    Expected output: 503.

Verification

curl -s http://localhost:8000/health/ready | jq '.status'

Expected output: "ready" (when all dependencies are healthy).

Common failures

  • Readiness probe never passes — check dependency URLs in the agent's configuration. Use curl from inside the container to manually test backend reachability.
  • Liveness probe triggers restart loop — increase initialDelaySeconds to give the agent time to load model weights or warm up caches.
  • Orchestrator kills the pod during startup — set start_period in Docker healthcheck or initialDelaySeconds in Kubernetes to at least 30 seconds for model-heavy agents.
  • Health endpoint returns 200 but dependency is unhealthy — ensure the probe handler uses a short timeout on dependency checks (timeout=2) to avoid blocking the health check thread.

Related guides