Reference layer

Systems

Deep architecture references for the protocols, runtimes, and patterns that make local AI work. Where the entity pages tell you what to use, the systems pages tell you how it actually works underneath — at the level engineers need before deploying a stack.

Protocol·18 min

What MCP is really solving

Model Context Protocol explained at protocol-engineering depth: lifecycle, tool invocation flow, security model, latency math, local vs remote tradeoffs, and the canonical reference stack for running MCP locally.

Architecture·17 min

What distributed inference actually is

Tensor parallelism vs pipeline parallelism vs CPU offload, why VRAM pooling is the wrong mental model, the latency math that makes consumer networking the bottleneck, when one bigger GPU still wins, and the failure modes that bite production deployments.

Architecture·22 min

What agent memory actually is

Vector vs graph vs OS-style memory; how Letta, Mem0, Zep, Graphiti, and MCP-memory differ in practice; the retrieval flow that determines whether the agent remembers correctly or hallucinates with confidence; when memory genuinely helps and when it actively hurts.

Architecture·21 min

How agent execution systems actually work

Planning loop primitives, tool dispatch, sandbox isolation, MCP + memory integration, failure modes specific to autonomous task execution. The architecture under OpenHands / OpenClaw / Goose / Aider / Cline / Continue compared at protocol-engineering depth.

Operations·16 min

Maintaining a local-AI build over time

What breaks after 3 months. Driver drift, CUDA mismatches, ROCm cycles, Windows-update fallout, Docker / runtime versioning, SSD wear from embeddings, fan curves, dust, BIOS, PSU degradation. The operator's day-90-and-beyond reality.

Operations·14 min

Observability for local AI

GPU utilization, VRAM leaks, KV-cache pressure, decode tok/s, queue depth, request latency. Prometheus + Grafana setup, dcgm-exporter, vLLM metrics, alert thresholds that actually catch the right failures.

Operations·18 min

Securing a local-AI deployment

Network exposure, reverse proxies, Tailscale + Cloudflare Tunnel, auth layers, API-key management, sandboxing for agent code execution, supply-chain risks for model weights, audit logging, the threat model that matters for homelab and team deployments.