Pillar guides

Hand-written, deeply researched guides on running AI locally. No listicles, no hedged "X might be good for Y" filler — clear opinions backed by real benchmarks.

Available now

Choosing a GPU for local AI in 2026

Tier-by-tier buying guide across NVIDIA RTX 50/40/30, AMD RX 7000/9000, Apple Silicon, and the used market. Honest verdicts per tier.

Hardware

10 min read·Last verified 2026-05-05

Will-It-Run methodology

The exact math behind our hardware compatibility predictions. KV cache, runtime overhead, bandwidth-based speed prediction, confidence levels.

Methodology

8 min read·Last verified 2026-05-05

In the queue

Guides we're writing next, in roughly priority order. Vote for one or suggest a topic at hello@runlocalai.co.

Quantization formats explained: GGUF, EXL2, AWQ, GPTQ, MLX

What Q4_K_M actually means, how mixed-precision quants work, why EXL2 is faster on NVIDIA, and which format to pick for which runner.

Planned

llama.cpp vs vLLM vs ExLlamaV2: when each one wins

Three runners, three philosophies, three different optimal use cases. Real benchmark comparisons and honest tradeoffs.

Planned

Local AI on Apple Silicon: the unified memory advantage

How M-series chips run LLMs, when MLX beats llama.cpp, and which Mac configurations are worth the price.

Planned

Local AI on AMD in 2026: the ROCm story

Where AMD has caught up, where it still trails, and which AMD configurations are worth buying for AI workloads.

Planned

Building a local coding assistant

Pick a model, pair it with an IDE/agent, configure it for your codebase. Real setup walkthrough.

Planned

Local RAG without the bloat

Minimal stack for retrieval-augmented generation entirely on your machine. No LangChain heroics required.

Planned