RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Production Local AI Deployment
  6. /Ch. 23
Production Local AI Deployment

23. Security Hardening

Chapter 23 of 24 · 20 min
KEY INSIGHT

Security controls introduce friction; effective hardening applies defense-in-depth only where friction has acceptable operational overhead, avoiding controls that drive users toward workarounds. ### TLS Configuration ```yaml # ingress-tls.yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: inference-ingress namespace: production annotations: nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/proxy-body-size: "100m" nginx.ingress.kubernetes.io/proxy-read-timeout: "300" spec: tls: - hosts: - inference.prod.internal - inference-api.example.com secretName: inference-tls-cert rules: - host: inference.prod.internal http: paths: - path: / pathType: Prefix backend: service: name: inference-server port: number: 8000 ``` ### Mutual TLS for Internal Services ```yaml # mtls-policy.yaml apiVersion: security.istio.io/v1beta1 kind: PeerAuthentication metadata: name: inference-mtls namespace: production spec: mtls: mode: STRICT apiVersion: security.istio.io/v1beta1 kind: AuthorizationPolicy metadata: name: inference-authz namespace: production spec: selector: matchLabels: app: inference-server rules: - from: - source: principals: ["cluster.local/ns/production/sa/internal-client"] to: - operation: methods: ["POST"] paths: ["/predict"] - from: - source: namespaces: ["gateway"] to: - operation: methods: ["GET"] paths: ["/health"] ``` ### API Authentication ```python # auth.py from fastapi import HTTPException, Security, Depends from fastapi.security import APIKeyHeader from starlette.requests import Request from collections import defaultdict import hashlib import time API_KEY_HEADER = APIKeyHeader(name="X-API-Key", auto_error=False) class TenantAuthenticator: def __init__(self): self.api_keys: dict[str, dict] = {} self.cache_validity_seconds = 300 def register_tenant(self, tenant_id: str, api_key: str, rate_limit: int): key_hash = hashlib.sha256(api_key.encode()).hexdigest() self.api_keys[key_hash] = { 'tenant_id': tenant_id, 'rate_limit': rate_limit, 'created_at': time.time() } def authenticate( self, request: Request, api_key: str = Security(API_KEY_HEADER) ) -> str: if not api_key: raise HTTPException(status_code=401, detail="API key required") key_hash = hashlib.sha256(api_key.encode()).hexdigest() credentials = self.api_keys.get(key_hash) if not credentials: raise HTTPException(status_code=401, detail="Invalid API key") # Per-tenant rate limiting tenant_id = credentials['tenant_id'] rate_key = f"{tenant_id}:{int(time.time() / 60)}" if not hasattr(self, 'rate_counts'): self.rate_counts = defaultdict(int) if self.rate_counts[rate_key] >= credentials['rate_limit']: raise HTTPException( status_code=429, detail="Rate limit exceeded" ) self.rate_counts[rate_key] += 1 return tenant_id ``` ### Secrets Management ```bash # Fetch secrets from vault kubectl create secret generic inference-secrets \ --from-literal=api-key=$(vault kv get -field=value secret/inference/api-key) \ --from-literal=model-storage-key=$(vault kv get -field=key secret/inference/storage) \ -n production # View secret references in deployment # NOTE: secrets should never appear in container logs or stdout ``` ### Security Scanning ```dockerfile # Dockerfile with multi-stage build and security hardening FROM python:3.11-slim as builder WORKDIR /build COPY requirements.txt . RUN pip install --user -r requirements.txt # Production stage with minimal attack surface FROM python:3.11-slim as production RUN useradd --create-home --shell /bin/false appuser COPY --from=builder /root/.local /root/.local COPY --chown=appuser:appuser ./app /app USER appuser ENV PATH=/root/.local/bin:$PATH ENV PYTHONDONTWRITEBYTECODE=1 ENV PYTHONUNBUFFERED=1 WORKDIR /app CMD ["python", "server.py"] ```

Production inference serving handles potentially sensitive data requiring protection across multiple attack surfaces: network ingress, model artifacts, API authentication, and infrastructure access. Security hardening reduces the attack surface while maintaining operational functionality.


EXERCISE

Implement API key authentication for an inference endpoint using hashed API keys stored in Kubernetes secrets. Configure TLS termination on the ingress controller with a valid certificate. Scan the inference container image for vulnerabilities using a local trivy installation and address findings in the Dockerfile.

← Chapter 22
Multi-Tenant Serving
Chapter 24 →
Production Stack Project