llama.cpp

The bedrock of local LLM inference. Most other tools wrap or embed it. Maximum control, maximum platform support, sharpest learning curve.

By Fredoline Eruo·Last verified May 6, 2026·90,000 GitHub stars

Overview

The bedrock of local LLM inference. Most other tools wrap or embed it. Maximum control, maximum platform support, sharpest learning curve.

Operating systems	macOS Linux Windows BSD Android
GPU backends	NVIDIA CUDA AMD ROCm Apple Metal Vulkan CPU
License	Open source · free

Yes — llama.cpp is free to download and use and open-source under a permissive license.

llama.cpp supports macOS, Linux, Windows, BSD, Android.

llama.cpp supports NVIDIA CUDA, AMD ROCm, Apple Metal, Vulkan, CPU. CPU-only inference is also possible but slow.

Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.