RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Local AI APIs and Integration
  6. /Ch. 12
Local AI APIs and Integration

12. Health Checks

Chapter 12 of 18 · 15 min
KEY INSIGHT

Health endpoints let orchestration systems verify readinessΓÇöseparate liveness probes from readiness checks to enable graceful degradation. Kubernetes uses health checks to manage pod lifecycle. Liveness probes determine whether a container should be restarted. Readiness probes determine whether a container can receive traffic. These probes must return quickly and accurately reflect the service's ability to function. A naive health endpoint simply returns 200. This passes when the server starts but provides no information about downstream dependencies. A realistic health check verifies database connectivity, cache availability, and external API reachability before reporting healthy status. ```python from fastapi import FastAPI from pydantic import BaseModel import asyncpg import aioredis class HealthStatus(BaseModel): status: str checks: dict app = FastAPI() async def check_database() -> dict: try: pool = app.state.db_pool async with pool.acquire() as conn: result = await conn.fetchval("SELECT 1") return {"database": {"status": "healthy", "latency_ms": 0}} except Exception as exc: return {"database": {"status": "unhealthy", "error": str(exc)}} async def check_cache() -> dict: try: redis = app.state.redis latency_start = datetime.now() await redis.ping() latency = (datetime.now() - latency_start).total_seconds() * 1000 return {"cache": {"status": "healthy", "latency_ms": round(latency, 1)}} except Exception as exc: return {"cache": {"status": "unhealthy", "error": str(exc)}} @app.get("/health/live") async def liveness(): return HealthStatus(status="alive", checks={}) @app.get("/health/ready") async def readiness(): checks = {} checks.update(await check_database()) checks.update(await check_cache()) unhealthy = [k for k, v in checks.items() if v.get("status") == "unhealthy"] if unhealthy: return JSONResponse( status_code=503, content=HealthStatus( status="unhealthy", checks=checks ).model_dump() ) return HealthStatus(status="healthy", checks=checks) ``` Liveness endpoints return immediately with no dependency checks. A slow liveness probe causes Kubernetes to restart containers unnecessarily. Readiness endpoints perform thorough checks and return 503 when dependencies fail, signaling that traffic should be routed elsewhere. Monitor health endpoint latency in production. A health check taking more than 100ms suggests resource contention or connection pool exhaustion. Alert on prolonged slowness before it impacts actual request handling.

EXERCISE

Add a custom health check that verifies an OpenAI-compatible API endpoint responds within acceptable latency. Include the check results in the readiness endpoint and return unhealthy status when the external API exceeds 500ms.

← Chapter 11
Error Responses
Chapter 13 →
OpenAPI Documentation