runner
Open source
free
4.6/5

llama.cpp

The bedrock of local LLM inference. Most other tools wrap or embed it. Maximum control, maximum platform support, sharpest learning curve.

By Fredoline Eruo·Last verified May 6, 2026·90,000 GitHub stars

Overview

The bedrock of local LLM inference. Most other tools wrap or embed it. Maximum control, maximum platform support, sharpest learning curve.

Pros

  • Runs everywhere — including phones
  • Authoritative GGUF tooling
  • Performance-tuned per-architecture

Cons

  • Build-from-source culture
  • CLI-only by default
  • Flag soup

Compatibility

Operating systems
macOS
Linux
Windows
BSD
Android
GPU backends
NVIDIA CUDA
AMD ROCm
Apple Metal
Vulkan
CPU
LicenseOpen source · free

Get llama.cpp

Frequently asked

Is llama.cpp free?

Yes — llama.cpp is free to download and use and open-source under a permissive license.

What operating systems does llama.cpp support?

llama.cpp supports macOS, Linux, Windows, BSD, Android.

Which GPUs work with llama.cpp?

llama.cpp supports NVIDIA CUDA, AMD ROCm, Apple Metal, Vulkan, CPU. CPU-only inference is also possible but slow.

Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.