What this does

This guide implements HTTP health check endpoints for AI agent services that report not only process liveness but also downstream dependency readiness. A liveness probe confirms the server process is running; a readiness probe verifies the agent can reach its model backend, vector database, and tool services before accepting traffic. These probes integrate directly with Docker healthchecks and Kubernetes pod lifecycle management.

Steps

Add a /health/live liveness endpoint that returns HTTP 200 with a minimal response:

@app.get("/health/live")
def liveness():
    return {"status": "alive", "timestamp": time.time()}

Add a /health/ready readiness endpoint that probes all backend dependencies:

@app.get("/health/ready")
def readiness():
    checks = {}
    try:
        resp = requests.get(f"{LLM_BACKEND}/health", timeout=2)
        checks["llm_backend"] = "ok" if resp.status_code == 200 else "degraded"
    except Exception:
        checks["llm_backend"] = "unreachable"
    try:
        db.ping()
        checks["vector_db"] = "ok"
    except Exception:
        checks["vector_db"] = "unreachable"
    all_ok = all(v == "ok" for v in checks.values())
    status_code = 200 if all_ok else 503
    return JSONResponse(content={"status": "ready" if all_ok else "not_ready", "checks": checks}, status_code=status_code)

Test the endpoints locally:
```
curl -s http://localhost:8000/health/live && echo "" && curl -s http://localhost:8000/health/ready
```
Expected output: two JSON responses, the second showing dependency status.

Add a Docker healthcheck directive in the Dockerfile or docker-compose.yml:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/health/live"]
  interval: 15s
  timeout: 5s
  retries: 3
  start_period: 30s

For Kubernetes, define liveness and readiness probes in the deployment spec:

livenessProbe:
  httpGet:
    path: /health/live
    port: 8000
  initialDelaySeconds: 10
  periodSeconds: 15
readinessProbe:
  httpGet:
    path: /health/ready
    port: 8000
  initialDelaySeconds: 20
  periodSeconds: 10

Deploy and watch the pod status:
```
kubectl get pods -w
```
Expected output: pod transitions from ContainerCreating to Running only after the readiness probe succeeds.
Simulate a dependency failure by stopping the model backend, then observe the readiness probe return 503:
```
curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/health/ready
```
Expected output: 503.

Verification

curl -s http://localhost:8000/health/ready | jq '.status'

Expected output: "ready" (when all dependencies are healthy).

Common failures

Readiness probe never passes — check dependency URLs in the agent's configuration. Use curl from inside the container to manually test backend reachability.
Liveness probe triggers restart loop — increase initialDelaySeconds to give the agent time to load model weights or warm up caches.
Orchestrator kills the pod during startup — set start_period in Docker healthcheck or initialDelaySeconds in Kubernetes to at least 30 seconds for model-heavy agents.
Health endpoint returns 200 but dependency is unhealthy — ensure the probe handler uses a short timeout on dependency checks (timeout=2) to avoid blocking the health check thread.

How to set up health checks for AI agent services with custom readiness probes

What this does

Steps

Verification

Common failures

Related guides