04. API Authentication
Local AI APIs need authentication even when the system isn't publicly accessible. Authentication prevents unauthorized internal access, limits blast radius from compromised devices, and provides audit trails.
Authentication patterns for local AI services:
API keys are the simplest approach. Generate a random string, store it securely, and require it in request headers:
# Client sends key in Authorization header
curl -H "Authorization: Bearer sk-local-$(openssl rand -hex 32)" \
http://localhost:11434/api/generate
HTTP Basic Auth works for low-security internal setups but transmits credentials Base64-encoded (not encrypted). Only use over localhost or TLS:
curl -u "operator:$(cat ~/.ai/secrets/api_key)" \
http://localhost:11434/api/generate
Mutual TLS (mTLS) provides strong authentication for high-security deployments. Both client and server present certificates, preventing both impersonation and man-in-the-middle attacks.
Token-based auth with expiration suits multi-user environments. Issue time-limited tokens that must be refreshed:
import time
import hmac
import hashlib
def generate_token(api_secret: str, user_id: str, ttl_seconds: int = 3600) -> str:
"""Generate a time-limited authentication token."""
expiry = int(time.time()) + ttl_seconds
payload = f"{user_id}:{expiry}"
signature = hmac.new(
api_secret.encode(),
payload.encode(),
hashlib.sha256
).hexdigest()
return f"{payload}:{signature}"
def verify_token(api_secret: str, token: str) -> bool | str:
"""Verify token and return user_id if valid, False otherwise."""
try:
user_id, expiry, signature = token.rsplit(":", 2)
if int(expiry) < time.time():
return False # Expired
expected_sig = hmac.new(
api_secret.encode(),
f"{user_id}:{expiry}".encode(),
hashlib.sha256
).hexdigest()
if hmac.compare_digest(signature, expected_sig):
return user_id
return False
except ValueError:
return False
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Implement API key authentication for Ollama by configuring the server to require a secret in requests. Verify that requests without the key return 401 Unauthorized.