04. API Gateway
The API gateway is the single entry point for all client traffic. It handles TLS, authentication, rate limiting, and request routing. Nginx with OpenResty provides the flexibility needed for header manipulation and request rewriting.
server {
listen 443 ssl;
server_name ai-app.example.com;
ssl_certificate /etc/nginx/certs/fullchain.pem;
ssl_certificate_key /etc/nginx/certs/privkey.pem;
# Rate limiting zones
limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=upload:10m rate=1r/s;
limit_req_zone $binary_remote_addr zone=ask:10m rate=5r/m;
client_max_body_size 50M;
location /api/v1/upload {
limit_req zone=upload burst=3 nodelay;
proxy_pass http://backend:8000/upload;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Request-ID $request_id;
}
location /api/v1/ask {
limit_req zone=ask burst=2 nodelay;
proxy_pass http://backend:8000/ask;
proxy_buffering off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
location / {
proxy_pass http://frontend:3000;
proxy_set_header Host $host;
}
}
The Lua module generates unique request IDs for distributed tracing. Each request receives an X-Request-ID header that propagates through all services and appears in logs.
Rate limiting zones prevent abuse. The upload zone allows 1 request per second per IP. The ask zone allows 5 requests per minute—enough for real use but prevents automated scraping.
Streaming responses require disabling proxy buffering. The proxy_buffering off directive passes chunks immediately to clients instead of buffering entire responses.
For authentication, JWT tokens validate at the gateway layer. The ngx_http_auth_jwt_module validates signatures without proxying to the backend. Invalid tokens return 401 immediately, reducing backend load.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Configure nginx with rate limiting for three endpoint types (public, authenticated, admin). Test with ab or wrk to verify limits.