RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Local AI APIs and Integration
  6. /Ch. 17
Local AI APIs and Integration

17. Production Hardening

Chapter 17 of 18 · 15 min
KEY INSIGHT

Production APIs require defense in depthΓÇörate limiting, authentication, and circuit breakers protect against both accidental overload and intentional abuse. Production environments face traffic patterns development never simulated. Burst traffic, concurrent requests, and adversarial users stress systems in ways unit tests cannot. Hardening applies defensive measures that preserve functionality while limiting damage from unexpected conditions. Rate limiting protects against both abuse and accidental overload. Token bucket algorithms allow burst capacity while enforcing sustained rates. Different limits for different clients enable fair resource allocation. ```python from fastapi import FastAPI, Request, HTTPException from slowapi import Limiter from slowapi.util import get_remote_address from slowapi.errors import RateLimitExceeded from starlette.responses import JSONResponse import time limiter = Limiter(key_func=get_remote_address) @app.exception_handler(RateLimitExceeded) async def rate_limit_handler(request: Request, exc: RateLimitExceeded): return JSONResponse( status_code=429, content={ "type": "https://api.example.com/errors/rate-limit", "title": "Too Many Requests", "status": 429, "detail": str(exc.detail), "retry_after": exc.detail.split()[-1] if "second" in exc.detail else 60 } ) @app.post("/v1/chat/completions") @limiter.limit("60/minute") async def completions(request: Request): # Endpoint implementation pass # Per-client rate limiting @app.post("/v1/embeddings") @limiter.limit("120/minute", key_func=lambda req: req.state.api_key) async def embeddings(request: Request): # Endpoint implementation pass ``` Authentication prevents unauthorized access. Bearer tokens in Authorization headers validate clients before processing requests. API key rotation enables security incidents without downtime. Token validation should happen before any business logic executes. Circuit breakers prevent cascading failures. When a dependency fails repeatedly, the circuit opens and requests fail immediately rather than waiting for timeouts. This prevents resource exhaustion and enables partial functionality during outages. ```python import asyncio from dataclasses import dataclass from datetime import datetime, timedelta from typing import Optional @dataclass class CircuitState: failures: int = 0 last_failure: Optional[datetime] = None is_open: bool = False opened_at: Optional[datetime] = None class CircuitBreaker: def __init__(self, threshold: int = 5, timeout: int = 60): self.threshold = threshold self.timeout = timeout self.state = CircuitState() async def call(self, func, *args, **kwargs): if self.state.is_open: if datetime.now() - self.state.opened_at > timedelta(seconds=self.timeout): self.state.is_open = False else: raise Exception("Circuit open - dependency unavailable") try: result = await func(*args, **kwargs) self.state.failures = 0 return result except Exception as exc: self.state.failures += 1 self.state.last_failure = datetime.now() if self.state.failures >= self.threshold: self.state.is_open = True self.state.opened_at = datetime.now() raise exc ```

EXERCISE

Implement a circuit breaker for external model provider calls. Configure the breaker to open after 3 consecutive failures and attempt recovery every 30 seconds. Log state transitions for monitoring.

← Chapter 16
Caching Layer
Chapter 18 →
API Gateway Project