Execution layer · L3

Stacks

Executable architectures. Each stack ties model + runtime + memory + protocol + hardware into one shipping recipe — with hardware budget, setup commands, expected outcome, and the operator-grade reasoning behind every pick. Where the catalog tells you what exists and the operational reviews tell you whether to use it, the stacks tell you how to actually assemble the thing.

Stack · L3·Workstation tier·7 components

Build a local coding-agent stack (May 2026)

A coding agent that drafts diffs, runs tests, and edits files autonomously — entirely on your hardware, with persistent memory of the codebase.

RTX 4090 24GB or M3 Max 64GB · 64GB+ system RAM · 1TB NVMe · Linux or macOS

Stack · L3·Workstation tier·7 components

Build an RTX 4090 AI workstation stack (May 2026)

A general-purpose AI workstation built around a single RTX 4090 24GB — runs a 32B-class coding model, a 14B chat model, and serves agent workloads to a small team on the same box.

RTX 4090 24GB · 64GB DDR5 · 1TB NVMe · 850W PSU · Linux (Ubuntu 24.04 LTS recommended) or Windows 11 with WSL2

Stack · L3·Workstation tier·7 components

Build a Mac-native AI stack (May 2026)

A Mac-native local AI stack that takes full advantage of unified memory and (optionally) scales across multiple Macs via Thunderbolt 5 — runs 32B-class models comfortably on a single Mac, frontier-class models across a cluster.

M3 Max 64GB+ unified memory (or M4 Pro / M4 Max) · macOS 14.5+ (26.2+ for RDMA) · 1TB+ SSD · optional second Mac via Thunderbolt 5 for clustering

Stack · L3·Workstation tier·5 components

Build an offline RAG workstation stack (May 2026)

Search and chat with thousands of private documents on a single workstation that never phones home — for legal, medical, financial, or any data class that legally cannot leave the network.

RTX 4090 24GB or M3 Max 64GB · 64GB+ system RAM · 2TB NVMe (documents + vector store) · Linux with no internet egress allowed by default

Stack · L3·Production tier·4 components

Build a distributed inference homelab stack (May 2026)

Run 70B-405B class models across 2-4 GPU machines on a controlled LAN. Real interconnect requirements; real monitoring; real failure modes. The path beyond 'just buy a bigger card.'

2-4x GPU nodes (each: 1-2x A100 80GB or RTX 5090 32GB · 256GB+ RAM · 2TB NVMe) · 100Gbps Ethernet or InfiniBand interconnect · dedicated head node for Ray · managed switch · UPS recommended

Stack · L3·Workstation tier·8 components

Build a memory-enabled local agent stack (May 2026)

An agent that takes a task, remembers what happened in prior sessions, retrieves relevant context from prior decisions, and avoids re-discovering the same dead ends — all on local hardware with no data leaving the machine.

RTX 4090 24GB (or 5090 32GB) · 64GB+ system RAM · 1TB NVMe · Linux preferred · PostgreSQL on the same box for the SQL-MCP path

Stack · L3·Homelab tier·5 components

Build a 16GB VRAM local AI stack (May 2026)

A useful local AI workstation on a 16GB VRAM card (RTX 4060 Ti 16GB, RTX 4080 Super, RTX 5070, M-class Apple Silicon with 24GB+ unified memory). Daily-driver quality at the budget tier without trying to pretend it's a 4090.

RTX 4060 Ti 16GB / RTX 4080 Super 16GB / RTX 5070 12-16GB / RX 7800 XT 16GB · 32GB+ system RAM · 1TB NVMe · Linux / Windows / macOS

Stack · L3·Workstation tier·6 components

Build a local reasoning-model stack (May 2026)

Run a reasoning-class model locally for math, code synthesis, multi-step analysis, and long-horizon problem-solving. Honest about the reasoning-token cost (extra 200-2000 tokens per query) and the hardware requirements that follow.

RTX 4090 24GB (minimum) or 5090 32GB / M3-M4 Max 64GB · 64GB+ system RAM · 1TB NVMe · long-context-friendly thermal envelope (sustained generation runs hot)

Stack · L3·Workstation tier·6 components

Build a local vision-model stack (May 2026)

Run a vision-language model locally for image understanding, document Q&A over screenshots, OCR-plus-reasoning, and visual analysis tasks. All processing on your hardware; images never leave the network.

RTX 4090 24GB / 5090 32GB / M3-M4 Max 64GB (vision tokens consume more VRAM than text) · 64GB+ system RAM · 1TB NVMe (image cache + model weights)

Stack · L3·Production tier·4 components

Build a multi-machine Apple Silicon cluster (May 2026)

Run frontier-class models (DeepSeek V3, Llama 4 Maverick) locally on a personal-affordable Apple Silicon cluster. Honest about what works (Thunderbolt 5 RDMA, Exo's pipeline parallel) and what doesn't (NVIDIA-only frameworks, training workloads, multi-tenant serving).

4-8x M4 Pro Mac Mini (each: 64GB unified memory recommended, 32GB minimum) · macOS 26.2+ on every node (RDMA prerequisite) · Thunderbolt 5 daisy-chain or hub · 10GbE backup network · 1500W UPS

Stack · L3·Workstation tier·7 components

Build a fully offline coding stack (May 2026)

An autonomous coding agent that runs entirely on a workstation with no outbound network egress. Pre-staged models, audited dependency chain, network-monitored verification — the stack that holds up to real air-gap audits.

RTX 4090 24GB · 64GB+ system RAM · 2TB NVMe · Linux (Ubuntu 24.04 LTS recommended) with iptables egress block · pre-staged models on local mirror