Pillar guides
Hand-written, deeply researched guides on running AI locally. No listicles, no hedged "X might be good for Y" filler — clear opinions backed by real benchmarks.
Available now
Choosing a GPU for local AI in 2026
Tier-by-tier buying guide across NVIDIA RTX 50/40/30, AMD RX 7000/9000, Apple Silicon, and the used market. Honest verdicts per tier.
Hardware
10 min read·Last verified 2026-05-05
Will-It-Run methodology
The exact math behind our hardware compatibility predictions. KV cache, runtime overhead, bandwidth-based speed prediction, confidence levels.
Methodology
8 min read·Last verified 2026-05-05
In the queue
Guides we're writing next, in roughly priority order. Vote for one or suggest a topic at hello@runlocalai.co.
Quantization formats explained: GGUF, EXL2, AWQ, GPTQ, MLX
What Q4_K_M actually means, how mixed-precision quants work, why EXL2 is faster on NVIDIA, and which format to pick for which runner.
Planned
llama.cpp vs vLLM vs ExLlamaV2: when each one wins
Three runners, three philosophies, three different optimal use cases. Real benchmark comparisons and honest tradeoffs.
Planned
Local AI on Apple Silicon: the unified memory advantage
How M-series chips run LLMs, when MLX beats llama.cpp, and which Mac configurations are worth the price.
Planned
Local AI on AMD in 2026: the ROCm story
Where AMD has caught up, where it still trails, and which AMD configurations are worth buying for AI workloads.
Planned
Building a local coding assistant
Pick a model, pair it with an IDE/agent, configure it for your codebase. Real setup walkthrough.
Planned
Local RAG without the bloat
Minimal stack for retrieval-augmented generation entirely on your machine. No LangChain heroics required.
Planned