BLK · DECISION SURFACE

Three questions, three answers.

Local AI gets confusing fast. This page exists so you don't have to read 50 spec sheets before you can act. Pick the question that matches what you're actually asking, get a real answer.

STARTER PATHS

Three setups we'd actually recommend.

Not exhaustive. Opinionated. Each path is a real hardware + software bundle we publish — not a marketing wrapper. Pick one, follow the linked stack page, you're running local AI in under an hour.

CHEAPEST
Cheapest
≤ $300 budget
Hardware
Used RTX 3060 12GB · 16GB system RAM
Fits
7B–14B at Q4 · 32B with offload

The cheapest sane way to run useful local AI in 2026. A used 3060 12GB pairs with Ollama + Open WebUI for chat, code completion, and a small RAG setup. You won't run 70B, but Phi-4 14B at Q4 holds its own for daily work.

Follow the 16GB VRAM Local AI stack
BALANCED
Balanced
≈ $800–$1,200 budget (CUDA path)
Hardware
RTX 4070 Ti SUPER 16GB · 32GB system RAM
Fits
14B comfortably · 32B at Q4 · 70B at IQ2/IQ3

The sweet spot for most readers. Snappy 32B chat, a coding agent that doesn't feel laggy, and headroom for vision models. Apple Silicon is a parallel sweet-spot if portability matters more than ecosystem breadth — see the Apple Silicon stack for that route (M3 Max + 36GB is ~$2k, a different price band).

Follow the 16GB VRAM Local AI stack
BEST-VALUE
Best-value
≈ $1,400–$2,000 budget
Hardware
Used dual RTX 3090 24GB · 48GB pooled VRAM
Fits
70B Q4 comfortably · long-context 32K+

Highest tok/s-per-dollar configuration we recommend. Two used 3090s pool to 48GB VRAM, run 70B at Q4 with room for a 32K KV cache, and stay within reach of a single 850W PSU. The bandwidth scaling is excellent for batched inference.

Follow the Dual-3090 workstation