runner
Open source
free
4.6/5
llama.cpp
The bedrock of local LLM inference. Most other tools wrap or embed it. Maximum control, maximum platform support, sharpest learning curve.
Overview
The bedrock of local LLM inference. Most other tools wrap or embed it. Maximum control, maximum platform support, sharpest learning curve.
Pros
- Runs everywhere — including phones
- Authoritative GGUF tooling
- Performance-tuned per-architecture
Cons
- Build-from-source culture
- CLI-only by default
- Flag soup
Compatibility
| Operating systems | macOS Linux Windows BSD Android |
| GPU backends | NVIDIA CUDA AMD ROCm Apple Metal Vulkan CPU |
| License | Open source · free |
Get llama.cpp
Frequently asked
Is llama.cpp free?
Yes — llama.cpp is free to download and use and open-source under a permissive license.
What operating systems does llama.cpp support?
llama.cpp supports macOS, Linux, Windows, BSD, Android.
Which GPUs work with llama.cpp?
llama.cpp supports NVIDIA CUDA, AMD ROCm, Apple Metal, Vulkan, CPU. CPU-only inference is also possible but slow.
Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.