Stacks
Executable architectures. Each stack ties model + runtime + memory + protocol + hardware into one shipping recipe — with hardware budget, setup commands, expected outcome, and the operator-grade reasoning behind every pick. Where the catalog tells you what exists and the operational reviews tell you whether to use it, the stacks tell you how to actually assemble the thing.
Build a local coding-agent stack (May 2026)
A coding agent that drafts diffs, runs tests, and edits files autonomously — entirely on your hardware, with persistent memory of the codebase.
Build an RTX 4090 AI workstation stack (May 2026)
A general-purpose AI workstation built around a single RTX 4090 24GB — runs a 32B-class coding model, a 14B chat model, and serves agent workloads to a small team on the same box.
Build a Mac-native AI stack (May 2026)
A Mac-native local AI stack that takes full advantage of unified memory and (optionally) scales across multiple Macs via Thunderbolt 5 — runs 32B-class models comfortably on a single Mac, frontier-class models across a cluster.
Build an offline RAG workstation stack (May 2026)
Search and chat with thousands of private documents on a single workstation that never phones home — for legal, medical, financial, or any data class that legally cannot leave the network.
Build a distributed inference homelab stack (May 2026)
Run 70B-405B class models across 2-4 GPU machines on a controlled LAN. Real interconnect requirements; real monitoring; real failure modes. The path beyond 'just buy a bigger card.'
Build a memory-enabled local agent stack (May 2026)
An agent that takes a task, remembers what happened in prior sessions, retrieves relevant context from prior decisions, and avoids re-discovering the same dead ends — all on local hardware with no data leaving the machine.
Build a 16GB VRAM local AI stack (May 2026)
A useful local AI workstation on a 16GB VRAM card (RTX 4060 Ti 16GB, RTX 4080 Super, RTX 5070, M-class Apple Silicon with 24GB+ unified memory). Daily-driver quality at the budget tier without trying to pretend it's a 4090.
Build a local reasoning-model stack (May 2026)
Run a reasoning-class model locally for math, code synthesis, multi-step analysis, and long-horizon problem-solving. Honest about the reasoning-token cost (extra 200-2000 tokens per query) and the hardware requirements that follow.
Build a local vision-model stack (May 2026)
Run a vision-language model locally for image understanding, document Q&A over screenshots, OCR-plus-reasoning, and visual analysis tasks. All processing on your hardware; images never leave the network.
Build a multi-machine Apple Silicon cluster (May 2026)
Run frontier-class models (DeepSeek V3, Llama 4 Maverick) locally on a personal-affordable Apple Silicon cluster. Honest about what works (Thunderbolt 5 RDMA, Exo's pipeline parallel) and what doesn't (NVIDIA-only frameworks, training workloads, multi-tenant serving).
Build a fully offline coding stack (May 2026)
An autonomous coding agent that runs entirely on a workstation with no outbound network egress. Pre-staged models, audited dependency chain, network-monitored verification — the stack that holds up to real air-gap audits.