23. Security Hardening

Chapter 23 of 24 · 20 min

KEY INSIGHT

Security controls introduce friction; effective hardening applies defense-in-depth only where friction has acceptable operational overhead, avoiding controls that drive users toward workarounds. ### TLS Configuration ```yaml # ingress-tls.yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: inference-ingress namespace: production annotations: nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/proxy-body-size: "100m" nginx.ingress.kubernetes.io/proxy-read-timeout: "300" spec: tls: - hosts: - inference.prod.internal - inference-api.example.com secretName: inference-tls-cert rules: - host: inference.prod.internal http: paths: - path: / pathType: Prefix backend: service: name: inference-server port: number: 8000 ``` ### Mutual TLS for Internal Services ```yaml # mtls-policy.yaml apiVersion: security.istio.io/v1beta1 kind: PeerAuthentication metadata: name: inference-mtls namespace: production spec: mtls: mode: STRICT apiVersion: security.istio.io/v1beta1 kind: AuthorizationPolicy metadata: name: inference-authz namespace: production spec: selector: matchLabels: app: inference-server rules: - from: - source: principals: ["cluster.local/ns/production/sa/internal-client"] to: - operation: methods: ["POST"] paths: ["/predict"] - from: - source: namespaces: ["gateway"] to: - operation: methods: ["GET"] paths: ["/health"] ``` ### API Authentication ```python # auth.py from fastapi import HTTPException, Security, Depends from fastapi.security import APIKeyHeader from starlette.requests import Request from collections import defaultdict import hashlib import time API_KEY_HEADER = APIKeyHeader(name="X-API-Key", auto_error=False) class TenantAuthenticator: def __init__(self): self.api_keys: dict[str, dict] = {} self.cache_validity_seconds = 300 def register_tenant(self, tenant_id: str, api_key: str, rate_limit: int): key_hash = hashlib.sha256(api_key.encode()).hexdigest() self.api_keys[key_hash] = { 'tenant_id': tenant_id, 'rate_limit': rate_limit, 'created_at': time.time() } def authenticate( self, request: Request, api_key: str = Security(API_KEY_HEADER) ) -> str: if not api_key: raise HTTPException(status_code=401, detail="API key required") key_hash = hashlib.sha256(api_key.encode()).hexdigest() credentials = self.api_keys.get(key_hash) if not credentials: raise HTTPException(status_code=401, detail="Invalid API key") # Per-tenant rate limiting tenant_id = credentials['tenant_id'] rate_key = f"{tenant_id}:{int(time.time() / 60)}" if not hasattr(self, 'rate_counts'): self.rate_counts = defaultdict(int) if self.rate_counts[rate_key] >= credentials['rate_limit']: raise HTTPException( status_code=429, detail="Rate limit exceeded" ) self.rate_counts[rate_key] += 1 return tenant_id ``` ### Secrets Management ```bash # Fetch secrets from vault kubectl create secret generic inference-secrets \ --from-literal=api-key=$(vault kv get -field=value secret/inference/api-key) \ --from-literal=model-storage-key=$(vault kv get -field=key secret/inference/storage) \ -n production # View secret references in deployment # NOTE: secrets should never appear in container logs or stdout ``` ### Security Scanning ```dockerfile # Dockerfile with multi-stage build and security hardening FROM python:3.11-slim as builder WORKDIR /build COPY requirements.txt . RUN pip install --user -r requirements.txt # Production stage with minimal attack surface FROM python:3.11-slim as production RUN useradd --create-home --shell /bin/false appuser COPY --from=builder /root/.local /root/.local COPY --chown=appuser:appuser ./app /app USER appuser ENV PATH=/root/.local/bin:$PATH ENV PYTHONDONTWRITEBYTECODE=1 ENV PYTHONUNBUFFERED=1 WORKDIR /app CMD ["python", "server.py"] ```

Production inference serving handles potentially sensitive data requiring protection across multiple attack surfaces: network ingress, model artifacts, API authentication, and infrastructure access. Security hardening reduces the attack surface while maintaining operational functionality.

EXERCISE

Implement API key authentication for an inference endpoint using hashed API keys stored in Kubernetes secrets. Configure TLS termination on the ingress controller with a valid certificate. Scan the inference container image for vulnerabilities using a local trivy installation and address findings in the Dockerfile.