Vulkan Compute

Vulkan compute is the cross-vendor GPU compute API from Khronos. llama.cpp ships a Vulkan backend that runs on AMD, Intel, and NVIDIA GPUs without vendor-specific drivers — making it the most portable GPU path for local inference.

Performance is typically 70–90% of the vendor-native path (CUDA on NVIDIA, ROCm on AMD). The win is portability: if your GPU is too old for ROCm or you're on an AMD APU, Vulkan is often the only path that works.

Limitations: no FP16 storage on some Intel iGPUs, no support for multi-GPU split, some quants (older K-quants) not yet implemented in the Vulkan kernels.

Related terms

See also