fatalEditorialReviewed May 2026

llama.cpp build failed — get the right backend compiling

Most llama.cpp build failures trace to a missing toolkit (CUDA, Metal, Vulkan SDK), wrong compiler version, or a stale CMake cache. Diagnose in order: PATH first, CMake version second, GCC/MSVC third.

llama.cppCMakeCUDA ToolkitMetalVulkan SDKROCm

By Fredoline Eruo · Last verified 2026-05-08

Diagnostic order — most likely first

CUDA toolkit not in PATH

Diagnose

`nvcc --version` returns 'command not found.' CMake errors with 'CUDA compiler not found' or 'nvcc not found in PATH.'

Fix

Add CUDA bin to PATH. Linux: `export PATH=/usr/local/cuda/bin:$PATH`. Windows: add `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.x\bin` via System → Environment Variables. Verify with `nvcc --version` before retrying build.

CMake too old (need 3.18+)

Diagnose

`cmake --version` shows < 3.18. CMake errors with 'unknown command' or 'feature not supported.'

Fix

Upgrade CMake. Linux: `sudo snap install cmake --classic` or build from source. macOS: `brew upgrade cmake`. Windows: download installer from cmake.org. llama.cpp HEAD typically needs CMake 3.20+.

Mismatched CUDA + GCC versions

Diagnose

Build fails with 'CUDA compiler does not support GCC 14' or similar. Common on Arch / Fedora / rolling distros where system GCC is newer than CUDA's supported version.

Fix

Install matching GCC: `sudo apt install gcc-12 g++-12` (Ubuntu) or `sudo pacman -S gcc12` (Arch). Configure CMake to use it: `cmake -B build -DCMAKE_C_COMPILER=gcc-12 -DCMAKE_CXX_COMPILER=g++-12 -DGGML_CUDA=ON`.

Metal flag on non-Apple machine

Diagnose

Build fails immediately with 'Metal framework not found' or 'this target is only available on macOS.'

Fix

Use the right backend flag. macOS: `-DGGML_METAL=ON`. Linux/Windows NVIDIA: `-DGGML_CUDA=ON`. AMD: `-DGGML_HIP=ON` (with ROCm) or `-DGGML_VULKAN=ON`. Don't mix Metal with anything else.

Vulkan SDK missing

Diagnose

`vulkaninfo` returns 'command not found.' Build with `-DGGML_VULKAN=ON` fails at 'Vulkan headers not found.'

Fix

Install Vulkan SDK from LunarG (https://vulkan.lunarg.com). Linux: `sudo apt install vulkan-sdk`. macOS: download MoltenVK from LunarG. Windows: SDK installer adds VULKAN_SDK env var. Verify with `vulkaninfo` before retrying.

Stale CMake build directory

Diagnose

Build worked yesterday, broken today after pulling new commits. Cryptic CMake errors mentioning 'cached value' or 'previous configuration.'

Fix

Wipe and rebuild: `rm -rf build && cmake -B build -DGGML_CUDA=ON && cmake --build build --config Release -j`. Alternatively `cmake --fresh -B build` (CMake 3.24+) preserves the directory but resets cached vars.

Best GPU for local AI →

Frequently asked questions

Should I build from source or use a prebuilt binary?

Prebuilt for evaluation (fastest path to a working `llama-cli`). Build from source if you want custom flags (FA, ROCm gfx targets, specific CUDA arch), are on a non-standard distro, or need the absolute latest commit. The official releases ship reasonable defaults for most users.

Best build flags for inference on RTX 4090?

`cmake -B build -DGGML_CUDA=ON -DGGML_CUDA_FA=ON -DCMAKE_CUDA_ARCHITECTURES=89` then `cmake --build build --config Release -j`. The architecture flag (89 for Ada / RTX 40-series) trims the binary to your card's compute capability and speeds the build.

Build keeps OOMing on my system — why?

CUDA compilation is RAM-intensive. The `-j` flag spawns N parallel compile jobs; if RAM < jobs × 2 GB, you'll OOM. Lower parallelism: `cmake --build build --config Release -j 4` (or `-j 2` on tight RAM). Building takes longer but won't OOM.

Related troubleshooting

CUDA out of memory

Why CUDA OOM happens during local LLM inference and image gen, how to confirm the real cause, and the four real fixes (smaller quant, shorter context, gradient checkpointing, or more VRAM).

ROCm not detected / AMD GPU not found

ROCm is finicky on consumer AMD GPUs in 2026. Here's the install order, the gfx-version override that fixes 80% of detection failures, and when to give up and use Vulkan.

torch.cuda.is_available() returns False

PyTorch falsely reporting no CUDA is the most common Python ML setup failure. The cause is almost always: wrong PyTorch wheel for your CUDA version, or a CPU-only build accidentally installed.

When the fix is hardware

A surprising fraction of troubleshooting tickets resolve to: this card doesn't have enough VRAM for what you're asking it to do. If you're hitting OOM after every reasonable fix, or your GPU genuinely can't fit the model you need, it's upgrade time:

Where next?

All troubleshooting guides

OrBest GPU for local AI Will it run on my hardware?