Systems
Deep architecture references for the protocols, runtimes, and patterns that make local AI work. Where the entity pages tell you what to use, the systems pages tell you how it actually works underneath — at the level engineers need before deploying a stack.
What MCP is really solving
Model Context Protocol explained at protocol-engineering depth: lifecycle, tool invocation flow, security model, latency math, local vs remote tradeoffs, and the canonical reference stack for running MCP locally.
What distributed inference actually is
Tensor parallelism vs pipeline parallelism vs CPU offload, why VRAM pooling is the wrong mental model, the latency math that makes consumer networking the bottleneck, when one bigger GPU still wins, and the failure modes that bite production deployments.
What agent memory actually is
Vector vs graph vs OS-style memory; how Letta, Mem0, Zep, Graphiti, and MCP-memory differ in practice; the retrieval flow that determines whether the agent remembers correctly or hallucinates with confidence; when memory genuinely helps and when it actively hurts.
How agent execution systems actually work
Planning loop primitives, tool dispatch, sandbox isolation, MCP + memory integration, failure modes specific to autonomous task execution. The architecture under OpenHands / OpenClaw / Goose / Aider / Cline / Continue compared at protocol-engineering depth.
Maintaining a local-AI build over time
What breaks after 3 months. Driver drift, CUDA mismatches, ROCm cycles, Windows-update fallout, Docker / runtime versioning, SSD wear from embeddings, fan curves, dust, BIOS, PSU degradation. The operator's day-90-and-beyond reality.
Observability for local AI
GPU utilization, VRAM leaks, KV-cache pressure, decode tok/s, queue depth, request latency. Prometheus + Grafana setup, dcgm-exporter, vLLM metrics, alert thresholds that actually catch the right failures.
Securing a local-AI deployment
Network exposure, reverse proxies, Tailscale + Cloudflare Tunnel, auth layers, API-key management, sandboxing for agent code execution, supply-chain risks for model weights, audit logging, the threat model that matters for homelab and team deployments.