How to set up health checks for AI agent services with custom readiness probes
AI service running on HTTP endpoint
What this does
This guide implements HTTP health check endpoints for AI agent services that report not only process liveness but also downstream dependency readiness. A liveness probe confirms the server process is running; a readiness probe verifies the agent can reach its model backend, vector database, and tool services before accepting traffic. These probes integrate directly with Docker healthchecks and Kubernetes pod lifecycle management.
Steps
Add a
/health/liveliveness endpoint that returns HTTP 200 with a minimal response:@app.get("/health/live") def liveness(): return {"status": "alive", "timestamp": time.time()}Add a
/health/readyreadiness endpoint that probes all backend dependencies:@app.get("/health/ready") def readiness(): checks = {} try: resp = requests.get(f"{LLM_BACKEND}/health", timeout=2) checks["llm_backend"] = "ok" if resp.status_code == 200 else "degraded" except Exception: checks["llm_backend"] = "unreachable" try: db.ping() checks["vector_db"] = "ok" except Exception: checks["vector_db"] = "unreachable" all_ok = all(v == "ok" for v in checks.values()) status_code = 200 if all_ok else 503 return JSONResponse(content={"status": "ready" if all_ok else "not_ready", "checks": checks}, status_code=status_code)Test the endpoints locally:
curl -s http://localhost:8000/health/live && echo "" && curl -s http://localhost:8000/health/readyExpected output: two JSON responses, the second showing dependency status.
Add a Docker healthcheck directive in the
Dockerfileordocker-compose.yml:healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health/live"] interval: 15s timeout: 5s retries: 3 start_period: 30sFor Kubernetes, define liveness and readiness probes in the deployment spec:
livenessProbe: httpGet: path: /health/live port: 8000 initialDelaySeconds: 10 periodSeconds: 15 readinessProbe: httpGet: path: /health/ready port: 8000 initialDelaySeconds: 20 periodSeconds: 10Deploy and watch the pod status:
kubectl get pods -wExpected output: pod transitions from
ContainerCreatingtoRunningonly after the readiness probe succeeds.Simulate a dependency failure by stopping the model backend, then observe the readiness probe return 503:
curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/health/readyExpected output:
503.
Verification
curl -s http://localhost:8000/health/ready | jq '.status'
Expected output: "ready" (when all dependencies are healthy).
Common failures
- Readiness probe never passes — check dependency URLs in the agent's configuration. Use
curlfrom inside the container to manually test backend reachability. - Liveness probe triggers restart loop — increase
initialDelaySecondsto give the agent time to load model weights or warm up caches. - Orchestrator kills the pod during startup — set
start_periodin Docker healthcheck orinitialDelaySecondsin Kubernetes to at least 30 seconds for model-heavy agents. - Health endpoint returns 200 but dependency is unhealthy — ensure the probe handler uses a short timeout on dependency checks (
timeout=2) to avoid blocking the health check thread.