RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to compile llama.cpp from source
HOW-TO · SET

How to compile llama.cpp from source

intermediate·30 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.xWindows 11 · Ollama 0.4.xmacOS 15 · Ollama 0.4.x
PREREQUISITES

Git, CMake, C++ compiler (GCC 11+ or Clang 15+), Python 3.8+

What this does

Builds the llama.cpp library and companion CLI and server binaries from the upstream Git repository. The result is a set of native executables (llama-cli, llama-server, llama-quantize) compiled and optimized for the host system architecture.

Steps

  1. Clone the repository.

    git clone https://github.com/ggerganov/llama.cpp.git
    cd llama.cpp
    

    Expected output: repository cloned; llama.cpp/ directory created.

  2. Create and enter a build directory. Out-of-source builds keep the source tree clean.

    mkdir build && cd build
    

    Expected output: build/ directory exists.

  3. Configure CMake with the desired backend. GPU support is enabled by adding -DLLAMA_CUDA=ON.

    cmake .. -DLLAMA_CUDA=ON -DLLAMA_BUILD_SERVER=ON -DLLAMA_BUILD_CLI=ON
    

    Expected output: -- Configuring done followed by -- Generating done.

  4. Compile with parallel jobs.

    cmake --build . --config Release -j$(nproc)
    

    Expected output: the final line reads [100%] Built target <target-name> for each binary.

  5. Verify the CLI binary runs.

    ./llama-cli --version
    

    Expected output: version string such as version: 1.0.0.

Verification

./llama-cli -m models/some-model.gguf -p "Hello world" -n 10 --no-display-prompt 2>/dev/null
# Expected: model loads and outputs a 10-token completion without errors

Common failures

  • nvcc: command not found — CUDA not in PATH. Set export PATH=/usr/local/cuda/bin:$PATH before running cmake.
  • Header ggml.h not found — Submodules not initialized. Run git submodule update --init --recursive.
  • CUDA compute capability mismatch — Set -DCMAKE_CUDA_ARCHITECTURES=75 for older GPUs.
  • Out of RAM during compilation — Reduce concurrency with -j4 on systems with limited RAM.
  • Python bindings not built — Install Python dev headers, then reconfigure with -DLLAMA_PYTHON_BINDINGS=ON.
  • Missing GLIBCXX symbols at runtime — The system libstdc++ is older than the build toolchain. Install a newer libstdc++-dev package and relink.

Related guides

  • How to run inference with llama.cpp server
  • How to quantize a model for llama.cpp
  • Course Local AI Fundamentals
RELATED GUIDES
SET
How to quantize a model for llama.cpp
SET
How to run inference with llama.cpp server
← All how-to guidesCourses →