fatalEditorialReviewed May 2026

llama.cpp build failed — get the right backend compiling

Most llama.cpp build failures trace to a missing toolkit (CUDA, Metal, Vulkan SDK), wrong compiler version, or a stale CMake cache. Diagnose in order: PATH first, CMake version second, GCC/MSVC third.

llama.cppCMakeCUDA ToolkitMetalVulkan SDKROCm
By Fredoline Eruo · Last verified 2026-05-08

Diagnostic order — most likely first

#1

CUDA toolkit not in PATH

Diagnose

`nvcc --version` returns 'command not found.' CMake errors with 'CUDA compiler not found' or 'nvcc not found in PATH.'

Fix

Add CUDA bin to PATH. Linux: `export PATH=/usr/local/cuda/bin:$PATH`. Windows: add `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.x\bin` via System → Environment Variables. Verify with `nvcc --version` before retrying build.

#2

CMake too old (need 3.18+)

Diagnose

`cmake --version` shows < 3.18. CMake errors with 'unknown command' or 'feature not supported.'

Fix

Upgrade CMake. Linux: `sudo snap install cmake --classic` or build from source. macOS: `brew upgrade cmake`. Windows: download installer from cmake.org. llama.cpp HEAD typically needs CMake 3.20+.

#3

Mismatched CUDA + GCC versions

Diagnose

Build fails with 'CUDA compiler does not support GCC 14' or similar. Common on Arch / Fedora / rolling distros where system GCC is newer than CUDA's supported version.

Fix

Install matching GCC: `sudo apt install gcc-12 g++-12` (Ubuntu) or `sudo pacman -S gcc12` (Arch). Configure CMake to use it: `cmake -B build -DCMAKE_C_COMPILER=gcc-12 -DCMAKE_CXX_COMPILER=g++-12 -DGGML_CUDA=ON`.

#4

Metal flag on non-Apple machine

Diagnose

Build fails immediately with 'Metal framework not found' or 'this target is only available on macOS.'

Fix

Use the right backend flag. macOS: `-DGGML_METAL=ON`. Linux/Windows NVIDIA: `-DGGML_CUDA=ON`. AMD: `-DGGML_HIP=ON` (with ROCm) or `-DGGML_VULKAN=ON`. Don't mix Metal with anything else.

#5

Vulkan SDK missing

Diagnose

`vulkaninfo` returns 'command not found.' Build with `-DGGML_VULKAN=ON` fails at 'Vulkan headers not found.'

Fix

Install Vulkan SDK from LunarG (https://vulkan.lunarg.com). Linux: `sudo apt install vulkan-sdk`. macOS: download MoltenVK from LunarG. Windows: SDK installer adds VULKAN_SDK env var. Verify with `vulkaninfo` before retrying.

#6

Stale CMake build directory

Diagnose

Build worked yesterday, broken today after pulling new commits. Cryptic CMake errors mentioning 'cached value' or 'previous configuration.'

Fix

Wipe and rebuild: `rm -rf build && cmake -B build -DGGML_CUDA=ON && cmake --build build --config Release -j`. Alternatively `cmake --fresh -B build` (CMake 3.24+) preserves the directory but resets cached vars.

Frequently asked questions

Should I build from source or use a prebuilt binary?

Prebuilt for evaluation (fastest path to a working `llama-cli`). Build from source if you want custom flags (FA, ROCm gfx targets, specific CUDA arch), are on a non-standard distro, or need the absolute latest commit. The official releases ship reasonable defaults for most users.

Best build flags for inference on RTX 4090?

`cmake -B build -DGGML_CUDA=ON -DGGML_CUDA_FA=ON -DCMAKE_CUDA_ARCHITECTURES=89` then `cmake --build build --config Release -j`. The architecture flag (89 for Ada / RTX 40-series) trims the binary to your card's compute capability and speeds the build.

Build keeps OOMing on my system — why?

CUDA compilation is RAM-intensive. The `-j` flag spawns N parallel compile jobs; if RAM < jobs × 2 GB, you'll OOM. Lower parallelism: `cmake --build build --config Release -j 4` (or `-j 2` on tight RAM). Building takes longer but won't OOM.

Related troubleshooting

When the fix is hardware

A surprising fraction of troubleshooting tickets resolve to: this card doesn't have enough VRAM for what you're asking it to do. If you're hitting OOM after every reasonable fix, or your GPU genuinely can't fit the model you need, it's upgrade time: